Jaycee Kaufman, Jouhyun Jeon, Jessica Oreskovic, Anirudh Thommandram, Yan Fossat
Background: Identifying subtle changes in the menstrual cycle is crucial for effective fertility tracking and understanding reproductive health.
Objective: The aim of the study is to explore how fundamental frequency features vary between menstrual phases using daily voice recordings.
Methods: This study analyzed smartphone-collected voice recordings from 16 naturally cycling female participants, collected every day for 1 full menstrual cycle. Fundamental frequency features (mean, SD, 5th percentile, and 95th percentile) were extracted from each voice recording. Ovulation was estimated using luteinizing hormone urine tests taken every morning. The analysis included comparisons of these features between the follicular and luteal phases and the application of changepoint detection algorithms to assess changes and pinpoint the day in which the shifts in vocal pitch occur.
Results: The fundamental frequency SD was 9.0% (SD 2.9%) lower in the luteal phase compared to the follicular phase (95% CI 3.4%-14.7%; P=.002), and the 5th percentile of the fundamental frequency was 8.8% (SD 3.6%) higher (95% CI 1.7%-16.0%; P=.01). No significant differences were found between phases in mean fundamental frequency or the 95th percentile of the fundamental frequency (P=.65 and P=.07). Changepoint detection, applied separately to each feature, identified the point in time when vocal frequency behaviors shifted. For the fundamental frequency SD and 5th percentile, 81% (n=13) of participants exhibited shifts within the fertile window (P=.03). In comparison, only 63% (n=10; P=.24) and 50% (n=8; P=.50) of participants had shifts in the fertile window for the mean and 95th percentile of the fundamental frequency, respectively.
Conclusions: These findings indicate that subtle variations in vocal pitch may reflect changes associated with the menstrual cycle, suggesting the potential for developing a noninvasive and convenient method for monitoring reproductive health. Changepoint detection may provide a promising avenue for future work in longitudinal fertility analysis.
{"title":"Longitudinal Changes in Pitch-Related Acoustic Characteristics of the Voice Throughout the Menstrual Cycle: Observational Study.","authors":"Jaycee Kaufman, Jouhyun Jeon, Jessica Oreskovic, Anirudh Thommandram, Yan Fossat","doi":"10.2196/65448","DOIUrl":"10.2196/65448","url":null,"abstract":"<p><strong>Background: </strong>Identifying subtle changes in the menstrual cycle is crucial for effective fertility tracking and understanding reproductive health.</p><p><strong>Objective: </strong>The aim of the study is to explore how fundamental frequency features vary between menstrual phases using daily voice recordings.</p><p><strong>Methods: </strong>This study analyzed smartphone-collected voice recordings from 16 naturally cycling female participants, collected every day for 1 full menstrual cycle. Fundamental frequency features (mean, SD, 5th percentile, and 95th percentile) were extracted from each voice recording. Ovulation was estimated using luteinizing hormone urine tests taken every morning. The analysis included comparisons of these features between the follicular and luteal phases and the application of changepoint detection algorithms to assess changes and pinpoint the day in which the shifts in vocal pitch occur.</p><p><strong>Results: </strong>The fundamental frequency SD was 9.0% (SD 2.9%) lower in the luteal phase compared to the follicular phase (95% CI 3.4%-14.7%; P=.002), and the 5th percentile of the fundamental frequency was 8.8% (SD 3.6%) higher (95% CI 1.7%-16.0%; P=.01). No significant differences were found between phases in mean fundamental frequency or the 95th percentile of the fundamental frequency (P=.65 and P=.07). Changepoint detection, applied separately to each feature, identified the point in time when vocal frequency behaviors shifted. For the fundamental frequency SD and 5th percentile, 81% (n=13) of participants exhibited shifts within the fertile window (P=.03). In comparison, only 63% (n=10; P=.24) and 50% (n=8; P=.50) of participants had shifts in the fertile window for the mean and 95th percentile of the fundamental frequency, respectively.</p><p><strong>Conclusions: </strong>These findings indicate that subtle variations in vocal pitch may reflect changes associated with the menstrual cycle, suggesting the potential for developing a noninvasive and convenient method for monitoring reproductive health. Changepoint detection may provide a promising avenue for future work in longitudinal fertility analysis.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e65448"},"PeriodicalIF":2.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142949179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine Scheffrahn, Claire Hall, Vanessa Muñiz, Gary Elkins
<p><strong>Background: </strong>Hypnotherapy has been shown to be a safe, nonhormonal intervention effective for treating menopausal hot flashes. However, women experiencing hot flashes may face accessibility barriers to in-person hypnotherapy. To solve this issue, a smartphone app has been created to deliver hypnotherapy. The Evia app delivers audio-recorded hypnotherapy and has the potential to help individuals experiencing hot flashes.</p><p><strong>Objective: </strong>This study aims to determine user outcomes in hot flash frequency and severity for users of the Evia app.</p><p><strong>Methods: </strong>This study is a retrospective analysis of a dataset of Evia app users. Participants were divided into 2 groups for analysis. The first group reported daytime hot flashes and night sweats, while the second group was asked to report only daytime hot flashes. The participants in the first group (daytime hot flashes and night sweats) were 139 women with ≥3 daily hot flashes who downloaded the Evia app between November 6, 2021, and June 9, 2022, with a baseline mean of 8.330 (SD 3.977) daily hot flashes. The participants in the second group (daytime hot flashes) were 271 women with ≥3 daily hot flashes who downloaded the Evia app between June 10, 2022, and February 5, 2024, with a baseline mean of 6.040 (SD 3.282) daily hot flashes. The Evia program included a 5-week program for all participants with daily tasks such as educational readings, hypnotic inductions, and daily hot-flash tracking. The app uses audio-recorded hypnosis and mental imagery for coolness, such as imagery for a cool breeze, snow, or calmness.</p><p><strong>Results: </strong>A clinically significant reduction, defined as a 50% reduction, in daily hot flashes was experienced by 76.3% (106/139) of the women with hot flashes and night sweats and 56.8% (154/271) of the women with daily hot flashes from baseline to their last logged Evia app survey. On average, the women with hot flashes and night sweats experienced a reduction of 61.4% (SD 33.185%) in their hot flashes experienced at day and night while using the Evia app, and the women with daily hot flashes experienced a reduction of 45.2% (SD 42.567%) in their daytime hot flashes. In both groups, there was a large, statistically significant difference in the average number of daily hot flashes from baseline to end point (women with hot flashes and night sweats: Cohen d=1.28; t<sub>138</sub>=15.055; P<.001; women with daily hot flashes: Cohen d=0.82; t<sub>270</sub>=13.555; P<.001).</p><p><strong>Conclusions: </strong>Hypnotherapy is an efficacious intervention for hot flashes, with the potential to improve women's lives by reducing hot flashes without hormonal or pharmacological intervention. This study takes the first step in evaluating the efficacy of an app-delivered hypnosis intervention for menopausal hot flashes, demonstrating the Evia app provides a promising app delivery of hypnotherapy with potential to increase accessib
{"title":"User Outcomes for an App-Delivered Hypnosis Intervention for Menopausal Hot Flashes: Retrospective Analysis.","authors":"Katherine Scheffrahn, Claire Hall, Vanessa Muñiz, Gary Elkins","doi":"10.2196/63948","DOIUrl":"10.2196/63948","url":null,"abstract":"<p><strong>Background: </strong>Hypnotherapy has been shown to be a safe, nonhormonal intervention effective for treating menopausal hot flashes. However, women experiencing hot flashes may face accessibility barriers to in-person hypnotherapy. To solve this issue, a smartphone app has been created to deliver hypnotherapy. The Evia app delivers audio-recorded hypnotherapy and has the potential to help individuals experiencing hot flashes.</p><p><strong>Objective: </strong>This study aims to determine user outcomes in hot flash frequency and severity for users of the Evia app.</p><p><strong>Methods: </strong>This study is a retrospective analysis of a dataset of Evia app users. Participants were divided into 2 groups for analysis. The first group reported daytime hot flashes and night sweats, while the second group was asked to report only daytime hot flashes. The participants in the first group (daytime hot flashes and night sweats) were 139 women with ≥3 daily hot flashes who downloaded the Evia app between November 6, 2021, and June 9, 2022, with a baseline mean of 8.330 (SD 3.977) daily hot flashes. The participants in the second group (daytime hot flashes) were 271 women with ≥3 daily hot flashes who downloaded the Evia app between June 10, 2022, and February 5, 2024, with a baseline mean of 6.040 (SD 3.282) daily hot flashes. The Evia program included a 5-week program for all participants with daily tasks such as educational readings, hypnotic inductions, and daily hot-flash tracking. The app uses audio-recorded hypnosis and mental imagery for coolness, such as imagery for a cool breeze, snow, or calmness.</p><p><strong>Results: </strong>A clinically significant reduction, defined as a 50% reduction, in daily hot flashes was experienced by 76.3% (106/139) of the women with hot flashes and night sweats and 56.8% (154/271) of the women with daily hot flashes from baseline to their last logged Evia app survey. On average, the women with hot flashes and night sweats experienced a reduction of 61.4% (SD 33.185%) in their hot flashes experienced at day and night while using the Evia app, and the women with daily hot flashes experienced a reduction of 45.2% (SD 42.567%) in their daytime hot flashes. In both groups, there was a large, statistically significant difference in the average number of daily hot flashes from baseline to end point (women with hot flashes and night sweats: Cohen d=1.28; t<sub>138</sub>=15.055; P<.001; women with daily hot flashes: Cohen d=0.82; t<sub>270</sub>=13.555; P<.001).</p><p><strong>Conclusions: </strong>Hypnotherapy is an efficacious intervention for hot flashes, with the potential to improve women's lives by reducing hot flashes without hormonal or pharmacological intervention. This study takes the first step in evaluating the efficacy of an app-delivered hypnosis intervention for menopausal hot flashes, demonstrating the Evia app provides a promising app delivery of hypnotherapy with potential to increase accessib","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e63948"},"PeriodicalIF":2.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142949098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seul Ki Choi, Jaclyn Marshall, Patrina Sexton Topper, Andrew Pregnall, José Bauermeister
<p><strong>Background: </strong>While the significance of care navigation in facilitating access to health care within the lesbian, gay, bisexual, transgender, queer, and other (LGBTQ+) communities has been acknowledged, there is limited research examining how care navigation influences an individual's ability to understand and access the care they need in real-world settings. By analyzing private sector data, we can bridge the gap between theoretical research findings and practical applications, ultimately informing both business strategies and public policy with evidence grounded in real-world efficacy.</p><p><strong>Objective: </strong>The objective of this study was to evaluate the impact of specialized virtual care navigation services on LGBTQ+ individuals' ability to comprehend and access necessary care within a national cohort of commercially insured members.</p><p><strong>Methods: </strong>This case study is based on the experience of commercially insured members, aged 18 or older, who used the LGBTQ+ Health Care Navigation (LGBTQ+ Navigation) service by Included Health between January 26 and July 31, 2023. Care coordinators assisted members by connecting them with vetted identity-affirming in-network providers, helping them navigate and understand their LGBTQ+ health benefits, and providing education and advocacy for clinical and nonclinical needs. We examined the impact of navigation on 5 member-reported outcomes. In addition to reporting the proportion who agreed or strongly agreed, we calculated an impact score that averaged assigned numerical values to all 5 question responses (1=strongly disagree to 5=strongly agree) for each respondent. We used ANOVA with Tukey post hoc tests and t tests to explore the relationships between the impact score and member characteristics, including optional self-reported demographics.</p><p><strong>Results: </strong>Out of 4703 LGBTQ+ Navigation cases, 7.53% (n=354) had member-reported outcomes. A large majority of LGBTQ+ members agreed or strongly agreed that care navigation resulted in less stress (315/354, 89%), less care avoidance (305/354, 86.2%), higher confidence in finding an identity-affirming provider (327/354, 92.4%), improved ability to comprehend health care information (312/354, 88.1%), and improved ability to engage with providers (308/354, 87%). The average impact score was 4.44 (SD 0.69), with statistically significant differences by gender identity (P=.003), race (P=.01), ethnicity (P=.008), and pronouns (P=.02). The scores were highest for members with multiple gender identities (mean 4.56, SD 0.37), and members who did not provide their race, ethnicity, or their pronouns (mean 4.55, SD 0.64). Impact scores were lowest for transgender members (mean 4.11, SD 0.95).</p><p><strong>Conclusions: </strong>The LGBTQ+ Navigation service, by enhancing members' comprehension and use of necessary care, demonstrates potential public health utility and value. Continuous evaluation of navigation s
背景:虽然护理导航在促进女同性恋、男同性恋、双性恋、跨性别、酷儿和其他(LGBTQ+)群体获得医疗保健方面的重要性已经得到承认,但关于护理导航如何影响个人在现实世界中理解和获得所需护理的能力的研究有限。通过分析私营部门数据,我们可以弥合理论研究成果与实际应用之间的差距,最终为商业战略和公共政策提供基于现实世界有效性的证据。目的:本研究的目的是评估专门的虚拟护理导航服务对LGBTQ+个人理解和获得必要护理的能力的影响。方法:本案例研究基于在2023年1月26日至7月31日期间使用LGBTQ+ Health Care Navigation (LGBTQ+ Navigation)服务的18岁及以上商业保险会员的经验。护理协调员通过将成员与网络内经过审查的身份确认提供者联系起来,帮助他们导航和了解LGBTQ+的健康福利,并为临床和非临床需求提供教育和宣传。我们检查了导航对5个成员报告结果的影响。除了报告同意或非常同意的比例外,我们还计算了一个影响分数,该分数将所有5个问题的回答(1=非常不同意到5=非常同意)的平均数值分配给每个受访者。我们使用方差分析与Tukey事后检验和t检验来探索影响评分与成员特征之间的关系,包括可选的自我报告的人口统计数据。结果:在4703例LGBTQ+导航病例中,7.53% (n=354)有成员报告的结果。绝大多数LGBTQ+成员同意或强烈同意护理导航减少了压力(315/354,89%),减少了护理回避(305/354,86.2%),提高了找到认同认同的提供者的信心(327/354,92.4%),提高了理解医疗信息的能力(312/354,88.1%),提高了与提供者互动的能力(308/354,87%)。平均影响评分为4.44 (SD 0.69),性别认同(P= 0.003)、种族(P= 0.01)、民族(P= 0.008)、代词(P= 0.02)差异有统计学意义。具有多重性别身份的成员得分最高(平均4.56分,标准差0.37),没有提供种族、民族或代词的成员得分最高(平均4.55分,标准差0.64)。跨性别成员的影响评分最低(平均4.11,标准差0.95)。结论:LGBTQ+导航服务通过提高成员对必要护理的理解和使用,展示了潜在的公共卫生效用和价值。对导航服务的持续评估可作为寻求促进健康公平和改善雇员归属感的雇主的补充工具。这一点尤其重要,因为美国对LGBTQ+社区的歧视和污名一直存在。因此,使用导航服务的可扩展和系统级更改对于接触更大比例的LGBTQ+人口至关重要。
{"title":"Impact of a Virtual Care Navigation Service on Member-Reported Outcomes Among Lesbian, Gay, Bisexual, Transgender, and Queer Populations: Case Study.","authors":"Seul Ki Choi, Jaclyn Marshall, Patrina Sexton Topper, Andrew Pregnall, José Bauermeister","doi":"10.2196/64137","DOIUrl":"10.2196/64137","url":null,"abstract":"<p><strong>Background: </strong>While the significance of care navigation in facilitating access to health care within the lesbian, gay, bisexual, transgender, queer, and other (LGBTQ+) communities has been acknowledged, there is limited research examining how care navigation influences an individual's ability to understand and access the care they need in real-world settings. By analyzing private sector data, we can bridge the gap between theoretical research findings and practical applications, ultimately informing both business strategies and public policy with evidence grounded in real-world efficacy.</p><p><strong>Objective: </strong>The objective of this study was to evaluate the impact of specialized virtual care navigation services on LGBTQ+ individuals' ability to comprehend and access necessary care within a national cohort of commercially insured members.</p><p><strong>Methods: </strong>This case study is based on the experience of commercially insured members, aged 18 or older, who used the LGBTQ+ Health Care Navigation (LGBTQ+ Navigation) service by Included Health between January 26 and July 31, 2023. Care coordinators assisted members by connecting them with vetted identity-affirming in-network providers, helping them navigate and understand their LGBTQ+ health benefits, and providing education and advocacy for clinical and nonclinical needs. We examined the impact of navigation on 5 member-reported outcomes. In addition to reporting the proportion who agreed or strongly agreed, we calculated an impact score that averaged assigned numerical values to all 5 question responses (1=strongly disagree to 5=strongly agree) for each respondent. We used ANOVA with Tukey post hoc tests and t tests to explore the relationships between the impact score and member characteristics, including optional self-reported demographics.</p><p><strong>Results: </strong>Out of 4703 LGBTQ+ Navigation cases, 7.53% (n=354) had member-reported outcomes. A large majority of LGBTQ+ members agreed or strongly agreed that care navigation resulted in less stress (315/354, 89%), less care avoidance (305/354, 86.2%), higher confidence in finding an identity-affirming provider (327/354, 92.4%), improved ability to comprehend health care information (312/354, 88.1%), and improved ability to engage with providers (308/354, 87%). The average impact score was 4.44 (SD 0.69), with statistically significant differences by gender identity (P=.003), race (P=.01), ethnicity (P=.008), and pronouns (P=.02). The scores were highest for members with multiple gender identities (mean 4.56, SD 0.37), and members who did not provide their race, ethnicity, or their pronouns (mean 4.55, SD 0.64). Impact scores were lowest for transgender members (mean 4.11, SD 0.95).</p><p><strong>Conclusions: </strong>The LGBTQ+ Navigation service, by enhancing members' comprehension and use of necessary care, demonstrates potential public health utility and value. Continuous evaluation of navigation s","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e64137"},"PeriodicalIF":2.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142949256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicola Bragazzi, Michèle Buchinger, Hisham Atwan, Ruba Tuma, Francesco Chirico, Lukasz Szarpak, Raymond Farah, Rola Khamisy-Farah
<p><strong>Background: </strong>The COVID-19 pandemic has significantly strained healthcare systems globally, leading to an overwhelming influx of patients and exacerbating resource limitations. Concurrently, an "infodemic" of misinformation, particularly prevalent in women's health, has emerged. This challenge has been pivotal for healthcare providers, especially gynecologists and obstetricians, in managing pregnant women's health. The pandemic heightened risks for pregnant women from COVID-19, necessitating balanced advice from specialists on vaccine safety versus known risks. Additionally, the advent of generative Artificial Intelligence (AI), such as large language models (LLMs), offers promising support in healthcare. However, they necessitate rigorous testing.</p><p><strong>Objective: </strong>To assess LLMs' proficiency, clarity, and objectivity regarding COVID-19 impacts in pregnancy.</p><p><strong>Methods: </strong>This study evaluates four major AI prototypes (ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and Google Bard) using zero-shot prompts in a questionnaire validated among 159 Israeli gynecologists and obstetricians. The questionnaire assesses proficiency in providing accurate information on COVID-19 in relation to pregnancy. Text-mining, sentiment analysis, and readability (Flesch-Kincaid grade level and Flesch Reading Ease Score) were also conducted.</p><p><strong>Results: </strong>In terms of LLMs' knowledge, ChatGPT-4 and Microsoft Copilot each scored 97% (n=32/33), Google Bard 94% (n=31/33), and ChatGPT-3.5 82% (n=27/33). ChatGPT-4 incorrectly stated an increased risk of miscarriage due to COVID-19. Google Bard and Microsoft Copilot had minor inaccuracies concerning COVID-19 transmission and complications. At the sentiment analysis, Microsoft Copilot achieved the least negative score (-4), followed by ChatGPT-4 (-6) and Google Bard ( -7), while ChatGPT-3.5 obtained the most negative score (-12). Finally, concerning the readability analysis, Flesch-Kincaid Grade Level and Flesch Reading Ease Score showed that Microsoft Copilot was the most accessible at 9.9 and 49, followed by ChatGPT-4 at 12.4 and 37.1, while ChatGPT-3.5 (12.9 and 35.6) and Google Bard (12.9 and 35.8) generated particularly complex responses.</p><p><strong>Conclusions: </strong>The study highlights varying knowledge levels of LLMs in relation to COVID-19 and pregnancy. ChatGPT-3.5 showed the least knowledge and alignment with scientific evidence. Readability and complexity analyses suggest that each AI's approach was tailored to specific audiences, with ChatGPT versions being more suitable for specialized readers and Microsoft Copilot for the general public. Sentiment analysis revealed notable variations in the way LLMs communicated critical information, underscoring the essential role of neutral and objective healthcare communication in ensuring that pregnant women, particularly vulnerable during the COVID-19 pandemic, receive accurate and reassuring guidance.
{"title":"Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19 Impacts in Pregnancy: A Cross-Sectional Pilot Study.","authors":"Nicola Bragazzi, Michèle Buchinger, Hisham Atwan, Ruba Tuma, Francesco Chirico, Lukasz Szarpak, Raymond Farah, Rola Khamisy-Farah","doi":"10.2196/56126","DOIUrl":"https://doi.org/10.2196/56126","url":null,"abstract":"<p><strong>Background: </strong>The COVID-19 pandemic has significantly strained healthcare systems globally, leading to an overwhelming influx of patients and exacerbating resource limitations. Concurrently, an \"infodemic\" of misinformation, particularly prevalent in women's health, has emerged. This challenge has been pivotal for healthcare providers, especially gynecologists and obstetricians, in managing pregnant women's health. The pandemic heightened risks for pregnant women from COVID-19, necessitating balanced advice from specialists on vaccine safety versus known risks. Additionally, the advent of generative Artificial Intelligence (AI), such as large language models (LLMs), offers promising support in healthcare. However, they necessitate rigorous testing.</p><p><strong>Objective: </strong>To assess LLMs' proficiency, clarity, and objectivity regarding COVID-19 impacts in pregnancy.</p><p><strong>Methods: </strong>This study evaluates four major AI prototypes (ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and Google Bard) using zero-shot prompts in a questionnaire validated among 159 Israeli gynecologists and obstetricians. The questionnaire assesses proficiency in providing accurate information on COVID-19 in relation to pregnancy. Text-mining, sentiment analysis, and readability (Flesch-Kincaid grade level and Flesch Reading Ease Score) were also conducted.</p><p><strong>Results: </strong>In terms of LLMs' knowledge, ChatGPT-4 and Microsoft Copilot each scored 97% (n=32/33), Google Bard 94% (n=31/33), and ChatGPT-3.5 82% (n=27/33). ChatGPT-4 incorrectly stated an increased risk of miscarriage due to COVID-19. Google Bard and Microsoft Copilot had minor inaccuracies concerning COVID-19 transmission and complications. At the sentiment analysis, Microsoft Copilot achieved the least negative score (-4), followed by ChatGPT-4 (-6) and Google Bard ( -7), while ChatGPT-3.5 obtained the most negative score (-12). Finally, concerning the readability analysis, Flesch-Kincaid Grade Level and Flesch Reading Ease Score showed that Microsoft Copilot was the most accessible at 9.9 and 49, followed by ChatGPT-4 at 12.4 and 37.1, while ChatGPT-3.5 (12.9 and 35.6) and Google Bard (12.9 and 35.8) generated particularly complex responses.</p><p><strong>Conclusions: </strong>The study highlights varying knowledge levels of LLMs in relation to COVID-19 and pregnancy. ChatGPT-3.5 showed the least knowledge and alignment with scientific evidence. Readability and complexity analyses suggest that each AI's approach was tailored to specific audiences, with ChatGPT versions being more suitable for specialized readers and Microsoft Copilot for the general public. Sentiment analysis revealed notable variations in the way LLMs communicated critical information, underscoring the essential role of neutral and objective healthcare communication in ensuring that pregnant women, particularly vulnerable during the COVID-19 pandemic, receive accurate and reassuring guidance.","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142965016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Femke Wouters, Henri Gruwez, Christophe Smeets, Anessa Pijalovic, Wouter Wilms, Julie Vranken, Zoë Pieters, Hugo Van Herendael, Dieter Nuyens, Maximo Rivero-Ayerza, Pieter Vandervoort, Peter Haemers, Laurent Pison
Background: Consumer-oriented wearable devices (CWDs) such as smartphones and smartwatches have gained prominence for their ability to detect atrial fibrillation (AF) through proprietary algorithms using electrocardiography or photoplethysmography (PPG)-based digital recordings. Despite numerous individual validation studies, a direct comparison of interdevice performance is lacking.
Objective: This study aimed to evaluate and compare the ability of CWDs to distinguish between sinus rhythm and AF.
Methods: Patients exhibiting sinus rhythm or AF were enrolled through a cardiology outpatient clinic. The participants were instructed to perform heart rhythm measurements using a handheld 6-lead electrocardiogram (ECG) device (KardiaMobile 6L), a smartwatch-derived single-lead ECG (Apple Watch), and two PPG-based smartphone apps (FibriCheck and Preventicus) in a random sequence, with simultaneous 12-lead reference ECG as the gold standard.
Results: A total of 122 participants were included in the study: median age 69 (IQR 61-77) years, 63.9% (n=78) men, 25% (n=30) with AF, 9.8% (n=12) without prior smartphone experience, and 73% (n=89) without experience in using a smartwatch. The sensitivity to detect AF was 100% for all devices. The specificity to detect sinus rhythm was 96.4% (95% CI 89.5%-98.8%) for KardiaMobile 6L, 97.8% (95% CI 91.6%-99.5%) for Apple Watch, 98.9% (95% CI 92.5%-99.8%) for FibriCheck, and 97.8% (95% CI 91.5%-99.4%) for Preventicus (P=.50). Insufficient quality measurements were observed in 10.7% (95% CI 6.3%-17.5%) of cases for both KardiaMobile 6L and Apple Watch, 7.4% (95% CI 3.9%-13.6%) for FibriCheck, and 14.8% (95% CI 9.5%-22.2%) for Preventicus (P=.21). Participants preferred Apple Watch over the other devices to monitor their heart rhythm.
Conclusions: In this study population, the discrimination between sinus rhythm and AF using CWDs based on ECG or PPG was highly accurate, with no significant variations in performance across the examined devices.
背景:面向消费者的可穿戴设备(CWDs),如智能手机和智能手表,通过使用基于心电图或光电体积脉搏波(PPG)的数字记录的专有算法检测心房颤动(AF)的能力已获得突出地位。尽管有许多单独的验证研究,但缺乏对设备间性能的直接比较。目的:本研究旨在评价和比较CWDs区分窦性心律和房颤的能力。方法:通过心脏病科门诊登记有窦性心律或房颤的患者。参与者被指示按随机顺序使用手持6导联心电图(ECG)设备(KardiaMobile 6L)、智能手表衍生的单导联心电图(Apple Watch)和两个基于ppg的智能手机应用程序(FibriCheck和preventticus)进行心律测量,同时使用12导联参考心电图作为金标准。结果:研究共纳入122名参与者:中位年龄为69岁(IQR 61-77)岁,63.9% (n=78)为男性,25% (n=30)为AF患者,9.8% (n=12)没有智能手机使用经验,73% (n=89)没有使用智能手表的经验。所有设备检测自动对焦的灵敏度均为100%。KardiaMobile 6L检测窦性心律的特异性为96.4% (95% CI 89.5%-98.8%), Apple Watch为97.8% (95% CI 91.6%-99.5%), FibriCheck为98.9% (95% CI 92.5%-99.8%), Preventicus为97.8% (95% CI 91.5%-99.4%) (P= 0.50)。在KardiaMobile 6L和Apple Watch中,10.7% (95% CI 6.3%-17.5%)的病例观察到质量测量不足,FibriCheck为7.4% (95% CI 3.9%-13.6%), preventticus为14.8% (95% CI 9.5%-22.2%) (P= 0.21)。与其他设备相比,参与者更喜欢苹果手表来监测他们的心律。结论:在本研究人群中,基于ECG或PPG使用CWDs对窦性心律和房颤的区分是高度准确的,在检查的设备之间没有明显的性能差异。
{"title":"Comparative Evaluation of Consumer Wearable Devices for Atrial Fibrillation Detection: Validation Study.","authors":"Femke Wouters, Henri Gruwez, Christophe Smeets, Anessa Pijalovic, Wouter Wilms, Julie Vranken, Zoë Pieters, Hugo Van Herendael, Dieter Nuyens, Maximo Rivero-Ayerza, Pieter Vandervoort, Peter Haemers, Laurent Pison","doi":"10.2196/65139","DOIUrl":"10.2196/65139","url":null,"abstract":"<p><strong>Background: </strong>Consumer-oriented wearable devices (CWDs) such as smartphones and smartwatches have gained prominence for their ability to detect atrial fibrillation (AF) through proprietary algorithms using electrocardiography or photoplethysmography (PPG)-based digital recordings. Despite numerous individual validation studies, a direct comparison of interdevice performance is lacking.</p><p><strong>Objective: </strong>This study aimed to evaluate and compare the ability of CWDs to distinguish between sinus rhythm and AF.</p><p><strong>Methods: </strong>Patients exhibiting sinus rhythm or AF were enrolled through a cardiology outpatient clinic. The participants were instructed to perform heart rhythm measurements using a handheld 6-lead electrocardiogram (ECG) device (KardiaMobile 6L), a smartwatch-derived single-lead ECG (Apple Watch), and two PPG-based smartphone apps (FibriCheck and Preventicus) in a random sequence, with simultaneous 12-lead reference ECG as the gold standard.</p><p><strong>Results: </strong>A total of 122 participants were included in the study: median age 69 (IQR 61-77) years, 63.9% (n=78) men, 25% (n=30) with AF, 9.8% (n=12) without prior smartphone experience, and 73% (n=89) without experience in using a smartwatch. The sensitivity to detect AF was 100% for all devices. The specificity to detect sinus rhythm was 96.4% (95% CI 89.5%-98.8%) for KardiaMobile 6L, 97.8% (95% CI 91.6%-99.5%) for Apple Watch, 98.9% (95% CI 92.5%-99.8%) for FibriCheck, and 97.8% (95% CI 91.5%-99.4%) for Preventicus (P=.50). Insufficient quality measurements were observed in 10.7% (95% CI 6.3%-17.5%) of cases for both KardiaMobile 6L and Apple Watch, 7.4% (95% CI 3.9%-13.6%) for FibriCheck, and 14.8% (95% CI 9.5%-22.2%) for Preventicus (P=.21). Participants preferred Apple Watch over the other devices to monitor their heart rhythm.</p><p><strong>Conclusions: </strong>In this study population, the discrimination between sinus rhythm and AF using CWDs based on ECG or PPG was highly accurate, with no significant variations in performance across the examined devices.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e65139"},"PeriodicalIF":2.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737281/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142949240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Schneider, Timur Cetin, Stefan Uppenkamp, Dirk Weyhe, Thomas Muender, Anke V Reinschluessel, Daniela Salzmann, Verena Uslar
<p><strong>Background: </strong>The integration of advanced technologies such as augmented reality (AR) and virtual reality (VR) into surgical procedures has garnered significant attention. However, the introduction of these innovations requires thorough evaluation in the context of human-machine interaction. Despite their potential benefits, new technologies can complicate surgical tasks and increase the cognitive load on surgeons, potentially offsetting their intended advantages. It is crucial to evaluate these technologies not only for their functional improvements but also for their impact on the surgeon's workload in clinical settings. A surgical team today must increasingly navigate advanced technologies such as AR and VR, aiming to reduce surgical trauma and enhance patient safety. However, each innovation needs to be evaluated in terms of human-machine interaction. Even if an innovation appears to bring advancements to the field it is applied in, it may complicate the work and increase the surgeon's workload rather than benefiting the surgeon.</p><p><strong>Objective: </strong>This study aims to establish a method for objectively determining the additional workload generated using AR or VR glasses in a clinical context for the first time.</p><p><strong>Methods: </strong>Electroencephalography (EEG) signals were recorded using a passive auditory oddball paradigm while 9 participants performed surgical planning for liver resection across 3 different conditions: (1) using AR glasses, (2) VR glasses, and (3) the conventional planning software on a computer.</p><p><strong>Results: </strong>The electrophysiological results, that is, the potentials evoked by the auditory stimulus, were compared with the subjectively perceived stress of the participants, as determined by the National Aeronautics and Space Administration-Task Load Index (NASA-TLX) questionnaire. The AR condition had the highest scores for mental demand (median 75, IQR 70-85), effort (median 55, IQR 30-65), and frustration (median 40, IQR 15-75) compared with the VR and PC conditions. The analysis of the EEG revealed a trend toward a lower amplitude of the N1 component as well as for the P3 component at the central electrodes in the AR condition, suggesting a higher workload for participants when using AR glasses. In addition, EEG components in the VR condition did not reveal any noticeable differences compared with the EEG components in the conventional planning condition. For the P1 component, the VR condition elicited significantly earlier latencies at the Fz electrode (mean 75.3 ms, SD 25.8 ms) compared with the PC condition (mean 99.4 ms, SD 28.6 ms).</p><p><strong>Conclusions: </strong>The results suggest a lower stress level when using VR glasses compared with AR glasses, likely due to the 3D visualization of the liver model. Additionally, the alignment between subjectively determined results and objectively determined results confirms the validity of the study design applie
{"title":"Measuring Bound Attention During Complex Liver Surgery Planning: Feasibility Study.","authors":"Tim Schneider, Timur Cetin, Stefan Uppenkamp, Dirk Weyhe, Thomas Muender, Anke V Reinschluessel, Daniela Salzmann, Verena Uslar","doi":"10.2196/62740","DOIUrl":"10.2196/62740","url":null,"abstract":"<p><strong>Background: </strong>The integration of advanced technologies such as augmented reality (AR) and virtual reality (VR) into surgical procedures has garnered significant attention. However, the introduction of these innovations requires thorough evaluation in the context of human-machine interaction. Despite their potential benefits, new technologies can complicate surgical tasks and increase the cognitive load on surgeons, potentially offsetting their intended advantages. It is crucial to evaluate these technologies not only for their functional improvements but also for their impact on the surgeon's workload in clinical settings. A surgical team today must increasingly navigate advanced technologies such as AR and VR, aiming to reduce surgical trauma and enhance patient safety. However, each innovation needs to be evaluated in terms of human-machine interaction. Even if an innovation appears to bring advancements to the field it is applied in, it may complicate the work and increase the surgeon's workload rather than benefiting the surgeon.</p><p><strong>Objective: </strong>This study aims to establish a method for objectively determining the additional workload generated using AR or VR glasses in a clinical context for the first time.</p><p><strong>Methods: </strong>Electroencephalography (EEG) signals were recorded using a passive auditory oddball paradigm while 9 participants performed surgical planning for liver resection across 3 different conditions: (1) using AR glasses, (2) VR glasses, and (3) the conventional planning software on a computer.</p><p><strong>Results: </strong>The electrophysiological results, that is, the potentials evoked by the auditory stimulus, were compared with the subjectively perceived stress of the participants, as determined by the National Aeronautics and Space Administration-Task Load Index (NASA-TLX) questionnaire. The AR condition had the highest scores for mental demand (median 75, IQR 70-85), effort (median 55, IQR 30-65), and frustration (median 40, IQR 15-75) compared with the VR and PC conditions. The analysis of the EEG revealed a trend toward a lower amplitude of the N1 component as well as for the P3 component at the central electrodes in the AR condition, suggesting a higher workload for participants when using AR glasses. In addition, EEG components in the VR condition did not reveal any noticeable differences compared with the EEG components in the conventional planning condition. For the P1 component, the VR condition elicited significantly earlier latencies at the Fz electrode (mean 75.3 ms, SD 25.8 ms) compared with the PC condition (mean 99.4 ms, SD 28.6 ms).</p><p><strong>Conclusions: </strong>The results suggest a lower stress level when using VR glasses compared with AR glasses, likely due to the 3D visualization of the liver model. Additionally, the alignment between subjectively determined results and objectively determined results confirms the validity of the study design applie","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e62740"},"PeriodicalIF":2.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754988/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaclyn D Borrowman, Lucas J Carr, Gary L Pierce, William T Story, Bethany Barone Gibbs, Kara M Whitaker
<p><strong>Background: </strong>Cardiovascular disease (CVD) is the leading cause of death among women in America. Hypertensive disorders of pregnancy (HDP) negatively impact acute and long-term cardiovascular health, with approximately 16% of all pregnancies affected. With CVD 2-4 times more likely after HDP compared to normotensive pregnancies, effective interventions to promote cardiovascular health are imperative.</p><p><strong>Objective: </strong>With postpartum physical activity (PA) interventions after HDP as an underexplored preventative strategy, we aimed in this study to assess (1) the feasibility and acceptability of a remotely delivered PA intervention for individuals with HDP 3-6 months postpartum and (2) changes in average steps per day, skills related to PA behavior, and postpartum blood pressure (BP).</p><p><strong>Methods: </strong>A remotely delivered 14-week health coaching intervention was designed based on prior formative work. The health coaching intervention called the Hypertensive Disorders of Pregnancy Postpartum Exercise (HyPE) intervention was tested for feasibility and acceptability with a single-arm proof-of-concept study design. A total of 19 women who were 3-6 months postpartum HDP; currently inactive; 18 years of age or older; resided in Iowa; and without diabetes, kidney disease, and CVD were enrolled. Feasibility was assessed by the number of sessions attended and acceptability by self-reported satisfaction with the program. Changes in steps achieved per day were measured with an activPAL4 micro, PA behavior skills via validated surveys online, and BP was assessed remotely with a research-grade Omron Series 5 (Omron Corporation) BP monitor.</p><p><strong>Results: </strong>Participants at enrollment were on average 30.3 years of age, 4.1 months postpartum, self-identified as non-Hispanic White (14/17, 82%), in a committed relationship (16/17, 94%), and had a bachelor's degree (9/17, 53%). A total of 140 of 152 possible health coaching sessions were attended by those who started the intervention (n=19, 92%). Intervention completers (n=17) indicated they were satisfied with the program (n=17, 100%) and would recommend it to others (n=17, 100%). No significant changes in activPAL measured steps were observed from pre- to posttesting (mean 138.40, SD 129.40 steps/day; P=.75). Significant improvements were observed in PA behavior skills including planning (mean 5.35, SD 4.97 vs mean 15.06, SD 3.09; P<.001) and monitoring of PA levels (mean 7.29, SD 3.44 vs mean 13.00, SD 2.45; P<.001). No significant decreases were observed for systolic (mean -1.28, SD 3.59 mm Hg; Hedges g=-0.26; P=.16) and diastolic BP (mean -1.80, SD 5.03 mm Hg; Hedges g=-0.44; P=.12).</p><p><strong>Conclusions: </strong>While PA behaviors did not change, the intervention was found to be feasible and acceptable among this sample of at-risk women. After additional refinement, the intervention should be retested among a larger, more diverse, and less p
{"title":"Postpartum Remote Health Coaching Intervention for Individuals With a Hypertensive Disorder of Pregnancy: Proof-of-Concept Study.","authors":"Jaclyn D Borrowman, Lucas J Carr, Gary L Pierce, William T Story, Bethany Barone Gibbs, Kara M Whitaker","doi":"10.2196/65611","DOIUrl":"10.2196/65611","url":null,"abstract":"<p><strong>Background: </strong>Cardiovascular disease (CVD) is the leading cause of death among women in America. Hypertensive disorders of pregnancy (HDP) negatively impact acute and long-term cardiovascular health, with approximately 16% of all pregnancies affected. With CVD 2-4 times more likely after HDP compared to normotensive pregnancies, effective interventions to promote cardiovascular health are imperative.</p><p><strong>Objective: </strong>With postpartum physical activity (PA) interventions after HDP as an underexplored preventative strategy, we aimed in this study to assess (1) the feasibility and acceptability of a remotely delivered PA intervention for individuals with HDP 3-6 months postpartum and (2) changes in average steps per day, skills related to PA behavior, and postpartum blood pressure (BP).</p><p><strong>Methods: </strong>A remotely delivered 14-week health coaching intervention was designed based on prior formative work. The health coaching intervention called the Hypertensive Disorders of Pregnancy Postpartum Exercise (HyPE) intervention was tested for feasibility and acceptability with a single-arm proof-of-concept study design. A total of 19 women who were 3-6 months postpartum HDP; currently inactive; 18 years of age or older; resided in Iowa; and without diabetes, kidney disease, and CVD were enrolled. Feasibility was assessed by the number of sessions attended and acceptability by self-reported satisfaction with the program. Changes in steps achieved per day were measured with an activPAL4 micro, PA behavior skills via validated surveys online, and BP was assessed remotely with a research-grade Omron Series 5 (Omron Corporation) BP monitor.</p><p><strong>Results: </strong>Participants at enrollment were on average 30.3 years of age, 4.1 months postpartum, self-identified as non-Hispanic White (14/17, 82%), in a committed relationship (16/17, 94%), and had a bachelor's degree (9/17, 53%). A total of 140 of 152 possible health coaching sessions were attended by those who started the intervention (n=19, 92%). Intervention completers (n=17) indicated they were satisfied with the program (n=17, 100%) and would recommend it to others (n=17, 100%). No significant changes in activPAL measured steps were observed from pre- to posttesting (mean 138.40, SD 129.40 steps/day; P=.75). Significant improvements were observed in PA behavior skills including planning (mean 5.35, SD 4.97 vs mean 15.06, SD 3.09; P<.001) and monitoring of PA levels (mean 7.29, SD 3.44 vs mean 13.00, SD 2.45; P<.001). No significant decreases were observed for systolic (mean -1.28, SD 3.59 mm Hg; Hedges g=-0.26; P=.16) and diastolic BP (mean -1.80, SD 5.03 mm Hg; Hedges g=-0.44; P=.12).</p><p><strong>Conclusions: </strong>While PA behaviors did not change, the intervention was found to be feasible and acceptable among this sample of at-risk women. After additional refinement, the intervention should be retested among a larger, more diverse, and less p","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e65611"},"PeriodicalIF":2.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The rapid proliferation of artificial intelligence (AI) requires new approaches for human-AI interfaces that are different from classic human-computer interfaces. In developing a system that is conducive to the analysis and use of health big data (HBD), reflecting the empirical characteristics of users who have performed HBD analysis is the most crucial aspect to consider. Recently, human-centered design methodology, a field of user-centered design, has been expanded and is used not only to develop types of products but also technologies and services.
Objective: This study was conducted to integrate and analyze users' experiences along the HBD analysis journey using the human-centered design methodology and reflect them in the development of AI agents that support future HBD analysis. This research aims to help accelerate the development of novel human-AI interfaces for AI agents that support the analysis and use of HBD, which will be urgently needed in the near future.
Methods: Using human-centered design methodology, we collected data through shadowing and in-depth interviews with 16 people with experience in analyzing and using HBD. We identified users' empirical characteristics, emotions, pain points, and needs related to HBD analysis and use and created personas and journey maps.
Results: The general characteristics of participants (n=16) were as follows: the majority were in their 40s (n=6, 38%) and held a PhD degree (n=10, 63%). Professors (n=7, 44%) and health care personnel (n=10, 63%) represented the largest professional groups. Participants' experiences with big data analysis varied, with 25% (n=4) being beginners and 38% (n=6) having extensive experience. Common analysis methods included statistical analysis (n=7, 44%) and data mining (n=6, 38%). Qualitative findings from shadowing and in-depth interviews revealed key challenges: lack of knowledge on using analytical solutions, crisis management difficulties during errors, and inadequate understanding of health care data and clinical decision-making, especially among non-health care professionals. Three types of personas and journey maps-health care professionals as big data analysis beginners, health care professionals who have experience in big data analytics, and non-health care professionals who are experts in big data analytics-were derived. They showed a need for personalized platforms tailored to the user level, appropriate direction through a navigation function, a crisis management support system, communication and sharing among users, and expert linkage service.
Conclusions: The knowledge obtained from this study can be leveraged in designing an AI agent to support future HBD analysis and use. This is expected to further increase the usability of HBD by helping users perform effective use of HBD more easily.
{"title":"Development of Personas and Journey Maps for Artificial Intelligence Agents Supporting the Use of Health Big Data: Human-Centered Design Approach.","authors":"Yoon Heui Lee, Hanna Choi, Soo-Kyoung Lee","doi":"10.2196/67272","DOIUrl":"10.2196/67272","url":null,"abstract":"<p><strong>Background: </strong>The rapid proliferation of artificial intelligence (AI) requires new approaches for human-AI interfaces that are different from classic human-computer interfaces. In developing a system that is conducive to the analysis and use of health big data (HBD), reflecting the empirical characteristics of users who have performed HBD analysis is the most crucial aspect to consider. Recently, human-centered design methodology, a field of user-centered design, has been expanded and is used not only to develop types of products but also technologies and services.</p><p><strong>Objective: </strong>This study was conducted to integrate and analyze users' experiences along the HBD analysis journey using the human-centered design methodology and reflect them in the development of AI agents that support future HBD analysis. This research aims to help accelerate the development of novel human-AI interfaces for AI agents that support the analysis and use of HBD, which will be urgently needed in the near future.</p><p><strong>Methods: </strong>Using human-centered design methodology, we collected data through shadowing and in-depth interviews with 16 people with experience in analyzing and using HBD. We identified users' empirical characteristics, emotions, pain points, and needs related to HBD analysis and use and created personas and journey maps.</p><p><strong>Results: </strong>The general characteristics of participants (n=16) were as follows: the majority were in their 40s (n=6, 38%) and held a PhD degree (n=10, 63%). Professors (n=7, 44%) and health care personnel (n=10, 63%) represented the largest professional groups. Participants' experiences with big data analysis varied, with 25% (n=4) being beginners and 38% (n=6) having extensive experience. Common analysis methods included statistical analysis (n=7, 44%) and data mining (n=6, 38%). Qualitative findings from shadowing and in-depth interviews revealed key challenges: lack of knowledge on using analytical solutions, crisis management difficulties during errors, and inadequate understanding of health care data and clinical decision-making, especially among non-health care professionals. Three types of personas and journey maps-health care professionals as big data analysis beginners, health care professionals who have experience in big data analytics, and non-health care professionals who are experts in big data analytics-were derived. They showed a need for personalized platforms tailored to the user level, appropriate direction through a navigation function, a crisis management support system, communication and sharing among users, and expert linkage service.</p><p><strong>Conclusions: </strong>The knowledge obtained from this study can be leveraged in designing an AI agent to support future HBD analysis and use. This is expected to further increase the usability of HBD by helping users perform effective use of HBD more easily.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e67272"},"PeriodicalIF":2.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754986/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142949245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>After suffering for an average of 7 years before diagnosis, endometriosis patients are usually left with more questions than answers about managing their symptoms in the absence of a cure. To help women with endometriosis after their diagnosis, we developed an online support program combining user research, evidence-based medicine, and clinical expertise. Structured around CBT and the quality-of-life metrics from the EHP score, the program is designed to guide participants over a 3-month and is available in France.</p><p><strong>Objective: </strong>This cohort study was designed to measure the impact of a digital health program on the symptom and quality of life levels of women with endometriosis.</p><p><strong>Methods: </strong>Ninety-two participants were included in the pilot study, among a total of 146 program participants who volunteered and assessed for eligibility for this research. They were recruited either free of charge through employer health insurance or via individual direct access. A control group of women with endometriosis who did not follow the program was recruited (n=404) through social media and mailing campaign. Questionnaires assessing quality of life and symptom levels were sent to program participants and controls at baseline and at three months via email. The control group was sampled according to initial pain level in order to obtain a similar pain profile between controls and program participants (n=149). Descriptive statistics and statistical tests (Chi-square, Fisher's exact, Wilcoxon, Mann-Whitney U, Student t-tests) were used to analyze intra- and inter-group differences, with Cohen's D measuring effect size for significant results.</p><p><strong>Results: </strong>Over three months, global symptom burden, the general level of pain, anxiety, depression, dysmenorrhea, dysuria, chronic fatigue, neuropathic pain, and endobelly levels improved significantly among program participants. These improvements were significantly different from the control group for global symptom burden (mean±SD: participants=-0.7±1.6, controls=-0.3±1.3, P=.048, small d), anxiety (participants=-1.1±2.8, controls=0.2±2.5, P<.001, medium d) and depression levels (participants=-0.9±2.5, controls=0.0±3.1, P=.04, small d), neuropathic pain (participants=-1.0±2.7, controls=-0.1±2.6, P=.004, small d), and endobelly (participants=-0.9±2.5, controls=-0.3±2.4, P=.03, small d). Participant quality of life evolution between baseline and three months improved and significantly differed from the control group for the core part of the EHP-5 (participants=-5.9±21.0, controls=1.0±14.8, P=.03, small d) and the EQ-5D (participants=0.1±0.1, controls=-0.0±0.1, P=.001, medium d). Perceived knowledge of endometriosis was significantly greater at three months among participants than in controls (P<.001).</p><p><strong>Conclusions: </strong>The results from this pilot study suggest that a digital health program providing medical and sci
{"title":"A digital program for daily life management with endometriosis: Pilot study on symptoms and quality of life among participants.","authors":"Zélia Breton, Emilie Stern, Mathilde Pinault, Delphine Lhuillery, Erick Petit, Pierre Panel, Maïa Alexaline","doi":"10.2196/58262","DOIUrl":"https://doi.org/10.2196/58262","url":null,"abstract":"<p><strong>Background: </strong>After suffering for an average of 7 years before diagnosis, endometriosis patients are usually left with more questions than answers about managing their symptoms in the absence of a cure. To help women with endometriosis after their diagnosis, we developed an online support program combining user research, evidence-based medicine, and clinical expertise. Structured around CBT and the quality-of-life metrics from the EHP score, the program is designed to guide participants over a 3-month and is available in France.</p><p><strong>Objective: </strong>This cohort study was designed to measure the impact of a digital health program on the symptom and quality of life levels of women with endometriosis.</p><p><strong>Methods: </strong>Ninety-two participants were included in the pilot study, among a total of 146 program participants who volunteered and assessed for eligibility for this research. They were recruited either free of charge through employer health insurance or via individual direct access. A control group of women with endometriosis who did not follow the program was recruited (n=404) through social media and mailing campaign. Questionnaires assessing quality of life and symptom levels were sent to program participants and controls at baseline and at three months via email. The control group was sampled according to initial pain level in order to obtain a similar pain profile between controls and program participants (n=149). Descriptive statistics and statistical tests (Chi-square, Fisher's exact, Wilcoxon, Mann-Whitney U, Student t-tests) were used to analyze intra- and inter-group differences, with Cohen's D measuring effect size for significant results.</p><p><strong>Results: </strong>Over three months, global symptom burden, the general level of pain, anxiety, depression, dysmenorrhea, dysuria, chronic fatigue, neuropathic pain, and endobelly levels improved significantly among program participants. These improvements were significantly different from the control group for global symptom burden (mean±SD: participants=-0.7±1.6, controls=-0.3±1.3, P=.048, small d), anxiety (participants=-1.1±2.8, controls=0.2±2.5, P<.001, medium d) and depression levels (participants=-0.9±2.5, controls=0.0±3.1, P=.04, small d), neuropathic pain (participants=-1.0±2.7, controls=-0.1±2.6, P=.004, small d), and endobelly (participants=-0.9±2.5, controls=-0.3±2.4, P=.03, small d). Participant quality of life evolution between baseline and three months improved and significantly differed from the control group for the core part of the EHP-5 (participants=-5.9±21.0, controls=1.0±14.8, P=.03, small d) and the EQ-5D (participants=0.1±0.1, controls=-0.0±0.1, P=.001, medium d). Perceived knowledge of endometriosis was significantly greater at three months among participants than in controls (P<.001).</p><p><strong>Conclusions: </strong>The results from this pilot study suggest that a digital health program providing medical and sci","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142949103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lisa M Gandy, Lana V Ivanitskaya, Leeza L Bacon, Rodina Bizri-Baryak
<p><strong>Background: </strong>Sentiment analysis is one of the most widely used methods for mining and examining text. Social media researchers need guidance on choosing between manual and automated sentiment analysis methods.</p><p><strong>Objective: </strong>Popular sentiment analysis tools based on natural language processing (NLP; VADER [Valence Aware Dictionary for Sentiment Reasoning], TEXT2DATA [T2D], and Linguistic Inquiry and Word Count [LIWC-22]), and a large language model (ChatGPT 4.0) were compared with manually coded sentiment scores, as applied to the analysis of YouTube comments on videos discussing the opioid epidemic. Sentiment analysis methods were also examined regarding ease of programming, monetary cost, and other practical considerations.</p><p><strong>Methods: </strong>Evaluation methods included descriptive statistics, receiver operating characteristic (ROC) curve analysis, confusion matrices, Cohen κ, accuracy, specificity, precision, sensitivity (recall), F<sub>1</sub>-score harmonic mean, and the Matthews correlation coefficient. An inductive, iterative approach to content analysis of the data was used to obtain manual sentiment codes.</p><p><strong>Results: </strong>A subset of comments were analyzed by a second coder, producing good agreement between the 2 coders' judgments (κ=0.734). YouTube social media about the opioid crisis had many more negative comments (4286/4871, 88%) than positive comments (79/662, 12%), making it possible to evaluate the performance of sentiment analysis models in an unbalanced dataset. The tone summary measure from LIWC-22 performed better than other tools for estimating the prevalence of negative versus positive sentiment. According to the ROC curve analysis, VADER was best at classifying manually coded negative comments. A comparison of Cohen κ values indicated that NLP tools (VADER, followed by LIWC's tone and T2D) showed only fair agreement with manual coding. In contrast, ChatGPT 4.0 had poor agreement and failed to generate binary sentiment scores in 2 out of 3 attempts. Variations in accuracy, specificity, precision, sensitivity, F<sub>1</sub>-score, and MCC did not reveal a single superior model. F<sub>1</sub>-score harmonic means were 0.34-0.38 (SD 0.02) for NLP tools and very low (0.13) for ChatGPT 4.0. None of the MCCs reached a strong correlation level.</p><p><strong>Conclusions: </strong>Researchers studying negative emotions, public worries, or dissatisfaction with social media face unique challenges in selecting models suitable for unbalanced datasets. We recommend VADER, the only cost-free tool we evaluated, due to its excellent discrimination, which can be further improved when the comments are at least 100 characters long. If estimating the prevalence of negative comments in an unbalanced dataset is important, we recommend the tone summary measure from LIWC-22. Researchers using T2D must know that it may only score some data and, compared with other methods, be more ti
{"title":"Public Health Discussions on Social Media: Evaluating Automated Sentiment Analysis Methods.","authors":"Lisa M Gandy, Lana V Ivanitskaya, Leeza L Bacon, Rodina Bizri-Baryak","doi":"10.2196/57395","DOIUrl":"https://doi.org/10.2196/57395","url":null,"abstract":"<p><strong>Background: </strong>Sentiment analysis is one of the most widely used methods for mining and examining text. Social media researchers need guidance on choosing between manual and automated sentiment analysis methods.</p><p><strong>Objective: </strong>Popular sentiment analysis tools based on natural language processing (NLP; VADER [Valence Aware Dictionary for Sentiment Reasoning], TEXT2DATA [T2D], and Linguistic Inquiry and Word Count [LIWC-22]), and a large language model (ChatGPT 4.0) were compared with manually coded sentiment scores, as applied to the analysis of YouTube comments on videos discussing the opioid epidemic. Sentiment analysis methods were also examined regarding ease of programming, monetary cost, and other practical considerations.</p><p><strong>Methods: </strong>Evaluation methods included descriptive statistics, receiver operating characteristic (ROC) curve analysis, confusion matrices, Cohen κ, accuracy, specificity, precision, sensitivity (recall), F<sub>1</sub>-score harmonic mean, and the Matthews correlation coefficient. An inductive, iterative approach to content analysis of the data was used to obtain manual sentiment codes.</p><p><strong>Results: </strong>A subset of comments were analyzed by a second coder, producing good agreement between the 2 coders' judgments (κ=0.734). YouTube social media about the opioid crisis had many more negative comments (4286/4871, 88%) than positive comments (79/662, 12%), making it possible to evaluate the performance of sentiment analysis models in an unbalanced dataset. The tone summary measure from LIWC-22 performed better than other tools for estimating the prevalence of negative versus positive sentiment. According to the ROC curve analysis, VADER was best at classifying manually coded negative comments. A comparison of Cohen κ values indicated that NLP tools (VADER, followed by LIWC's tone and T2D) showed only fair agreement with manual coding. In contrast, ChatGPT 4.0 had poor agreement and failed to generate binary sentiment scores in 2 out of 3 attempts. Variations in accuracy, specificity, precision, sensitivity, F<sub>1</sub>-score, and MCC did not reveal a single superior model. F<sub>1</sub>-score harmonic means were 0.34-0.38 (SD 0.02) for NLP tools and very low (0.13) for ChatGPT 4.0. None of the MCCs reached a strong correlation level.</p><p><strong>Conclusions: </strong>Researchers studying negative emotions, public worries, or dissatisfaction with social media face unique challenges in selecting models suitable for unbalanced datasets. We recommend VADER, the only cost-free tool we evaluated, due to its excellent discrimination, which can be further improved when the comments are at least 100 characters long. If estimating the prevalence of negative comments in an unbalanced dataset is important, we recommend the tone summary measure from LIWC-22. Researchers using T2D must know that it may only score some data and, compared with other methods, be more ti","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e57395"},"PeriodicalIF":2.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}