Pub Date : 2025-12-23eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf166
Katherine Parkin, Ryan Crowley, Rachel Sippy, Shabina Hayat, Yi Zhang, Emily Brewis, Nicole Marshall, Tara Ramsay-Patel, Vahgisha Thirugnanasampanthan, Guy Skinner, Peter Fonagy, Carol Brayne, Anna Moore
Objectives: To create a theoretical framework of mental health risk factors to inform the development of prediction models for young people's mental health problems.
Materials and methods: We created an initial prototype theoretical framework using a rapid literature search and stakeholder discussion. A snowball sampling approach identified experts for the Delphi study. Round 1 sought consensus on the overall approach, framework domains, and life course stages. Round 2 aimed to establish the points in the life course where exposure to specific risk factors would be most influential. Round 3 ranked risk factors within domains by their predictive importance for young people's mental health problems.
Results: The final framework reached consensus after 3 rounds and included 287 risk factors across 8 domains and 5 life course stages. Twenty-five experts completed round 3. Domains ranked as most important were "Social and Environmental" and "Psychological and Mental Health." Ranked lists of risk factors within domains and heat maps showing the salience of risk factors across life course stages were generated.
Discussion: The study integrated multidisciplinary expert perspectives and prioritized health equity throughout the framework's development. The ranked risk factor lists and life stage heat maps support the targeted inclusion of risk factors across developmental stages in prediction models.
Conclusion: This theoretical framework provides a roadmap of important risk factors for inclusion in early identification models to enhance the predictive accuracy of childhood mental health problems. It offers a useful theoretical reference point to support model building for those without domain expertise.
{"title":"Development of a risk factor framework to inform machine learning prediction of young people's mental health problems: a Delphi study.","authors":"Katherine Parkin, Ryan Crowley, Rachel Sippy, Shabina Hayat, Yi Zhang, Emily Brewis, Nicole Marshall, Tara Ramsay-Patel, Vahgisha Thirugnanasampanthan, Guy Skinner, Peter Fonagy, Carol Brayne, Anna Moore","doi":"10.1093/jamiaopen/ooaf166","DOIUrl":"10.1093/jamiaopen/ooaf166","url":null,"abstract":"<p><strong>Objectives: </strong>To create a theoretical framework of mental health risk factors to inform the development of prediction models for young people's mental health problems.</p><p><strong>Materials and methods: </strong>We created an initial prototype theoretical framework using a rapid literature search and stakeholder discussion. A snowball sampling approach identified experts for the Delphi study. Round 1 sought consensus on the overall approach, framework domains, and life course stages. Round 2 aimed to establish the points in the life course where exposure to specific risk factors would be most influential. Round 3 ranked risk factors within domains by their predictive importance for young people's mental health problems.</p><p><strong>Results: </strong>The final framework reached consensus after 3 rounds and included 287 risk factors across 8 domains and 5 life course stages. Twenty-five experts completed round 3. Domains ranked as most important were \"Social and Environmental\" and \"Psychological and Mental Health.\" Ranked lists of risk factors within domains and heat maps showing the salience of risk factors across life course stages were generated.</p><p><strong>Discussion: </strong>The study integrated multidisciplinary expert perspectives and prioritized health equity throughout the framework's development. The ranked risk factor lists and life stage heat maps support the targeted inclusion of risk factors across developmental stages in prediction models.</p><p><strong>Conclusion: </strong>This theoretical framework provides a roadmap of important risk factors for inclusion in early identification models to enhance the predictive accuracy of childhood mental health problems. It offers a useful theoretical reference point to support model building for those without domain expertise.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf166"},"PeriodicalIF":3.4,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12726920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf164
Sarah Y Bessen, Sean Tackett, Kimberly S Peairs, Lisa Christopher-Stine, Charles M Stewart, Lee D Biddison, Maria Oliva-Hemker, Jennifer K Lee
Objectives: Electronic health record (EHR) work may differently affect women and men physicians. Identifying gender discrepancies in EHR work across different specialties may inform strategies to reduce EHR burdens.
Materials and methods: We retrospectively evaluated EHR use by ambulatory physicians in 4 specialties (2 procedural [cardiology and gastroenterology] and 2 nonprocedural [internal medicine and rheumatology]) during 1 year at a large academic medical institution. Gender differences in EHR and clinical workload across specialties were evaluated by analysis of variance. Mixed-effects linear regression models analyzed gender differences in EHR workload controlling for specialty. Significant differences were additionally examined by stratifying procedural and nonprocedural specialties.
Results: Clinical and EHR workload varied across specialties (P <.05), though scheduled clinical workload did not differ by gender. Controlling for specialty, women physicians spent more time per appointment on In Basket messages (P =.001), sent more Secure Chat messages per appointment (P =.003), and spent more time in the EHR outside 7:00 AM-7:00 PM (P <.001) than men. Gender differences in messaging were concentrated among the procedural physicians. Women procedural physicians spent more time on In Basket messages (P <.001) and sent more Secure Chat messages (P =.007) than men, whereas these differences did not occur among nonprocedural physicians.
Discussion: Women physicians had greater EHR burdens despite similar scheduled clinical workloads as men. The greater messaging workload predominantly affected women procedural physicians.
Conclusion: Gender disparities in EHR burden in ambulatory specialties vary between procedural and nonprocedural fields. Future research is needed to mitigate gender inequity in EHR workloads.
{"title":"Higher electronic health record burden among women physicians in academic ambulatory medicine.","authors":"Sarah Y Bessen, Sean Tackett, Kimberly S Peairs, Lisa Christopher-Stine, Charles M Stewart, Lee D Biddison, Maria Oliva-Hemker, Jennifer K Lee","doi":"10.1093/jamiaopen/ooaf164","DOIUrl":"10.1093/jamiaopen/ooaf164","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic health record (EHR) work may differently affect women and men physicians. Identifying gender discrepancies in EHR work across different specialties may inform strategies to reduce EHR burdens.</p><p><strong>Materials and methods: </strong>We retrospectively evaluated EHR use by ambulatory physicians in 4 specialties (2 procedural [cardiology and gastroenterology] and 2 nonprocedural [internal medicine and rheumatology]) during 1 year at a large academic medical institution. Gender differences in EHR and clinical workload across specialties were evaluated by analysis of variance. Mixed-effects linear regression models analyzed gender differences in EHR workload controlling for specialty. Significant differences were additionally examined by stratifying procedural and nonprocedural specialties.</p><p><strong>Results: </strong>Clinical and EHR workload varied across specialties (<i>P</i> <.05), though scheduled clinical workload did not differ by gender. Controlling for specialty, women physicians spent more time per appointment on In Basket messages (<i>P</i> =.001), sent more Secure Chat messages per appointment (<i>P</i> =.003), and spent more time in the EHR outside 7:00 AM-7:00 PM (<i>P</i> <.001) than men. Gender differences in messaging were concentrated among the procedural physicians. Women procedural physicians spent more time on In Basket messages (<i>P</i> <.001) and sent more Secure Chat messages (<i>P</i> =.007) than men, whereas these differences did not occur among nonprocedural physicians.</p><p><strong>Discussion: </strong>Women physicians had greater EHR burdens despite similar scheduled clinical workloads as men. The greater messaging workload predominantly affected women procedural physicians.</p><p><strong>Conclusion: </strong>Gender disparities in EHR burden in ambulatory specialties vary between procedural and nonprocedural fields. Future research is needed to mitigate gender inequity in EHR workloads.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf164"},"PeriodicalIF":3.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-14eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf168
Robin Austin, Malin Britt Lalich, Katy Stewart, Jonna Zarbano, Matthew Byrne, Melissa D Pinto, Elizabeth E Umberfield
Objectives: The primary objective of this research is to assess the content coverage of nursing data within a publicly available common data model (CDM), focusing on how nursing data, documented in flowsheets, are represented within the model.
Materials and methods: This mapping study was informed by previous evaluation studies and serves as a framework for evaluating information resources, including to guide development and implementation. The overall research process consists of 4 steps: (1) identify a CDM; (2) define evaluation criteria; (3) map nursing flowsheet data; and (4) apply evaluation criteria.
Results: Overall, 65.5% (n = 1170) of the flowsheet concepts were mapped to Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) target codes and 56.0% (n = 1831) of the flowsheet values were mapped to SNOMED CT and LOINC target codes. The flowsheet concepts had a higher average mapping time per concept/reviewer (1.19 min) as compared to the average mapping time per value/reviewer (0.64 min).
Discussion: This mapping study demonstrated the progress and ongoing challenges of mapping nursing data to a national common data model. However, the ability to use nursing data at scale in a national CDM remains limited until more comprehensive mapping is completed.
Conclusion: This mapping study identifies a significant gap in integrating nursing data into a national common data model, highlighting an opportunity to enhance patient care through improved real-time insights and evidence-based nursing practices. Addressing this gap can help shape policies that prioritize the inclusion of nursing data. Additionally, aligning nursing data at scale can advance research, increase efficiency, and optimize nurse-sensitive patient outcomes.
{"title":"Exploring common data model coverage of nursing flowsheet data: a pilot study using SNOMED CT and LOINC mapping.","authors":"Robin Austin, Malin Britt Lalich, Katy Stewart, Jonna Zarbano, Matthew Byrne, Melissa D Pinto, Elizabeth E Umberfield","doi":"10.1093/jamiaopen/ooaf168","DOIUrl":"10.1093/jamiaopen/ooaf168","url":null,"abstract":"<p><strong>Objectives: </strong>The primary objective of this research is to assess the content coverage of nursing data within a publicly available common data model (CDM), focusing on how nursing data, documented in flowsheets, are represented within the model.</p><p><strong>Materials and methods: </strong>This mapping study was informed by previous evaluation studies and serves as a framework for evaluating information resources, including to guide development and implementation. The overall research process consists of 4 steps: (1) identify a CDM; (2) define evaluation criteria; (3) map nursing flowsheet data; and (4) apply evaluation criteria.</p><p><strong>Results: </strong>Overall, 65.5% (<i>n</i> = 1170) of the flowsheet concepts were mapped to Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) target codes and 56.0% (<i>n</i> = 1831) of the flowsheet values were mapped to SNOMED CT and LOINC target codes. The flowsheet concepts had a higher average mapping time per concept/reviewer (1.19 min) as compared to the average mapping time per value/reviewer (0.64 min).</p><p><strong>Discussion: </strong>This mapping study demonstrated the progress and ongoing challenges of mapping nursing data to a national common data model. However, the ability to use nursing data at scale in a national CDM remains limited until more comprehensive mapping is completed.</p><p><strong>Conclusion: </strong>This mapping study identifies a significant gap in integrating nursing data into a national common data model, highlighting an opportunity to enhance patient care through improved real-time insights and evidence-based nursing practices. Addressing this gap can help shape policies that prioritize the inclusion of nursing data. Additionally, aligning nursing data at scale can advance research, increase efficiency, and optimize nurse-sensitive patient outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf168"},"PeriodicalIF":3.4,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701890/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145763949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf124
[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].
[这更正了文章DOI: 10.1093/jamiaopen/ooz061.]。
{"title":"Correction to: Response to survey directed to patient portal members differs by age, race, and healthcare utilization.","authors":"","doi":"10.1093/jamiaopen/ooaf124","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooaf124","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf124"},"PeriodicalIF":3.4,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12706857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf127
Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira
Objective: To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.
Materials and methods: The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.
Results: With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.
Discussion and conclusion: Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.
目的:利用自动语音识别(ASR)和自然语言处理技术,开发和评估一种用于标记暴露过程编码系统(EPCS)质量代码(特别是暴露和鼓励事件)的自动分类系统。材料与方法:对该系统进行了3个临床试验的360个手动标记的儿童强迫症(OCD)治疗疗程的训练和测试。使用ASR工具(OpenAI的Whisper和谷歌Speech-to-Text)转录音频记录。转录准确性通过人工转录2分钟音频片段的单词错误率(WER)与asr生成的转录进行比较。结果文本使用基于变压器的模型进行分析,包括来自变压器的双向编码器表示(BERT)、句子-BERT和Meta Llama 3。训练模型在两种分类设置下预测EPCS代码:序列级分类,其中事件在分隔的文本块中标记,以及标记级分类,其中事件边界未知。通过微调变压器模型或对每个模型产生的嵌入进行逻辑回归进行分类。结果:在转录准确性方面,Whisper优于谷歌Speech-to-Text, WER较低(0.31 vs 0.51)。在序列分类设置方面,Llama 3模型的ROC曲线下面积(area under ROC curve, AUC)得分在曝光和鼓励事件下分别为0.95和0.75,优于传统方法和标准BERT模型。在令牌级别设置中,微调BERT模型表现最佳,暴露的AUC得分为0.85,鼓励事件的AUC得分为0.75。讨论和结论:当前的ASR和基于变压器的模型能够实现面对面暴露治疗过程的自动质量编码。这些发现显示了在临床实践和有效治疗方法的可扩展研究中进行实时评估的潜力。未来的工作应侧重于优化,包括提高ASR的准确性、扩展训练数据集和多模态数据集成。
{"title":"Automated classification of exposure and encourage events in speech data from pediatric OCD treatment.","authors":"Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira","doi":"10.1093/jamiaopen/ooaf127","DOIUrl":"10.1093/jamiaopen/ooaf127","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.</p><p><strong>Materials and methods: </strong>The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.</p><p><strong>Results: </strong>With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.</p><p><strong>Discussion and conclusion: </strong>Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf127"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf156
Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler
Objectives: The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.
Materials and methods: Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the "true" cancer-relevant labels in a withheld test set.
Results: All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.
Discussion: Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.
Conclusions: Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.
{"title":"Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center.","authors":"Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler","doi":"10.1093/jamiaopen/ooaf156","DOIUrl":"10.1093/jamiaopen/ooaf156","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.</p><p><strong>Materials and methods: </strong>Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the \"true\" cancer-relevant labels in a withheld test set.</p><p><strong>Results: </strong>All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.</p><p><strong>Discussion: </strong>Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.</p><p><strong>Conclusions: </strong>Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf156"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf165
Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh
Objective: Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.
Materials and methods: This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.
Results: The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.
Discussion: Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.
Conclusion: The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.
{"title":"Multimodal feature analysis for automated neonatal jaundice assessment using machine learning.","authors":"Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh","doi":"10.1093/jamiaopen/ooaf165","DOIUrl":"10.1093/jamiaopen/ooaf165","url":null,"abstract":"<p><strong>Objective: </strong>Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.</p><p><strong>Materials and methods: </strong>This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.</p><p><strong>Results: </strong>The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.</p><p><strong>Discussion: </strong>Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.</p><p><strong>Conclusion: </strong>The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf165"},"PeriodicalIF":3.4,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf162
Jason K Hou, Tiffany M Tang, Shubhada Sansgiry, Tony Van, Peter A Richardson, Codey Pham, Francesca Cunningham, Jessica A Baker, Ji Zhu, Akbar K Waljee
Objectives: Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.
Materials and methods: This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.
Results: In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.
Conclusions: Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.
{"title":"Using machine learning algorithms to optimize treatment with high-cost biologics in a national cohort of patients with inflammatory bowel disease.","authors":"Jason K Hou, Tiffany M Tang, Shubhada Sansgiry, Tony Van, Peter A Richardson, Codey Pham, Francesca Cunningham, Jessica A Baker, Ji Zhu, Akbar K Waljee","doi":"10.1093/jamiaopen/ooaf162","DOIUrl":"10.1093/jamiaopen/ooaf162","url":null,"abstract":"<p><strong>Objectives: </strong>Prediction models using statistical or machine learning (ML) approaches can enhance clinical decision support tools. Infliximab (IFX), a biologic with a newly introduced biosimilar for Crohn's disease (CD) and ulcerative colitis (UC), presents an opportunity to evaluate these tools at time of biosimilar switch to predict disease flares. This study sought to evaluate real-world safety and effectiveness of nonmedical IFX biosimilar switch in a national US cohort of CD and UC patients, and to develop and compare interpretable models for predicting adverse clinical events among patients on maintenance IFX.</p><p><strong>Materials and methods: </strong>This retrospective cohort study used administrative and clinical data from the National Veterans Health Administration Corporate Data Warehouse. It included 2529 Veterans with CD or UC on maintenance IFX (2017-2020), either continuing originator IFX or switching to a biosimilar. The primary outcome was disease-related flare. Classification and survival models were developed using traditional and ML methods and assessed via receiver operating characteristic curve, precision-recall curve, and decision curve analysis.</p><p><strong>Results: </strong>In 2529 Veterans with CD or UC, biosimilar switch had low predictive importance across survival models. Objective laboratory-related information yielded the highest validation. Random forest+ (RF+) outperformed all other statistical and ML models. Prior flares and total health-care encounters were the 2 most important predictors, while hemoglobin was the top laboratory predictor.</p><p><strong>Conclusions: </strong>Prediction models, particularly RF+, may aid in optimizing biologic therapy for CD and UC by identifying patients at higher risk of flare following a biosimilar switch.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf162"},"PeriodicalIF":3.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12681052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1093/jamiaopen/ooaf158
Michael Colacci, Chloe Pou-Prom, Arjumand Siddiqi, Muhammad Mamdani, Amol A Verma
Background: Bias evaluations of machine learning (ML) models often focus on performance in research settings, with limited assessment of downstream bias following clinical deployment. The objective of this study was to evaluate whether CHARTwatch, a real-time ML early warning system for inpatient deterioration, demonstrated algorithmic bias in model performance, or produced disparities in care processes, and outcomes across patient sociodemographic groups.
Methods: We evaluated CHARTwatch implementation on the internal medicine service at a large academic hospital. Patient outcomes during the intervention period (November 1, 2020-June 1, 2022) were compared to the control period (November 1, 2016-December 31, 2019) using propensity score overlap weighting. We evaluated differences across key sociodemographic subgroups, including age, sex, homelessness, and neighborhood-level socioeconomic and racialized composition. Outcomes included model performance (sensitivity and specificity), processes of care, and patient outcomes (non-palliative in-hospital death).
Results: Among 12 877 patients (9079 control, 3798 intervention), 13.3% were experiencing homelessness and 36.9% lived in the quintile with the highest neighborhood racialized and newcomer populations. Model sensitivity was 70.1% overall, with no significant variation across subgroups. Model specificity varied by age, <60 years: 93% (95% Confidence Interval [CI] 91-95%), 60-80 years: 90% (95%CI 87-92%), and >80 years: 84% (95%CI 79-88%), P < .001, but not other subgroups. CHARTwatch implementation was associated with an increase in code status documentation among patients experiencing homelessness, without significant differences in other care processes or outcomes.
Conclusion: CHARTwatch model performance and impact were generally consistent across measured sociodemographic subgroups. ML-based clinical decision support tools, and associated standardization of care, may reduce existing inequities, as was observed for code status orders among patients experiencing homelessness. This evaluation provides a framework for future bias assessments of deployed ML-CDS tools.
{"title":"Evaluating sociodemographic bias in a deployed machine-learned patient deterioration model.","authors":"Michael Colacci, Chloe Pou-Prom, Arjumand Siddiqi, Muhammad Mamdani, Amol A Verma","doi":"10.1093/jamiaopen/ooaf158","DOIUrl":"10.1093/jamiaopen/ooaf158","url":null,"abstract":"<p><strong>Background: </strong>Bias evaluations of machine learning (ML) models often focus on performance in research settings, with limited assessment of downstream bias following clinical deployment. The objective of this study was to evaluate whether CHARTwatch, a real-time ML early warning system for inpatient deterioration, demonstrated algorithmic bias in model performance, or produced disparities in care processes, and outcomes across patient sociodemographic groups.</p><p><strong>Methods: </strong>We evaluated CHARTwatch implementation on the internal medicine service at a large academic hospital. Patient outcomes during the intervention period (November 1, 2020-June 1, 2022) were compared to the control period (November 1, 2016-December 31, 2019) using propensity score overlap weighting. We evaluated differences across key sociodemographic subgroups, including age, sex, homelessness, and neighborhood-level socioeconomic and racialized composition. Outcomes included model performance (sensitivity and specificity), processes of care, and patient outcomes (non-palliative in-hospital death).</p><p><strong>Results: </strong>Among 12 877 patients (9079 control, 3798 intervention), 13.3% were experiencing homelessness and 36.9% lived in the quintile with the highest neighborhood racialized and newcomer populations. Model sensitivity was 70.1% overall, with no significant variation across subgroups. Model specificity varied by age, <60 years: 93% (95% Confidence Interval [CI] 91-95%), 60-80 years: 90% (95%CI 87-92%), and >80 years: 84% (95%CI 79-88%), <i>P</i> < .001, but not other subgroups. CHARTwatch implementation was associated with an increase in code status documentation among patients experiencing homelessness, without significant differences in other care processes or outcomes.</p><p><strong>Conclusion: </strong>CHARTwatch model performance and impact were generally consistent across measured sociodemographic subgroups. ML-based clinical decision support tools, and associated standardization of care, may reduce existing inequities, as was observed for code status orders among patients experiencing homelessness. This evaluation provides a framework for future bias assessments of deployed ML-CDS tools.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf158"},"PeriodicalIF":3.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1093/jamiaopen/ooaf134
Ellen Wright Clayton, Susannah Rose, Camille Nebecker, Laurie Novak, Yael Bensoussan, You Chen, Benjamin X Collins, Ashley Cordes, Barbara J Evans, Kadija S Ferryman, Samantha Hurst, Xiaoqian Jiang, Aaron Y Lee, Shannon McWeeney, Jillian Parker, Jean-Christophe Bélisle-Pipon, Eric Rosenthal, Zhijun Yin, Joseph Yracheta, Bradley Adam Malin
Objectives: The NIH's Bridge2AI Program has funded 4 "new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.
Materials and methods: We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.
Results: We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.
Discussion: The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.
Conclusions: This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.
{"title":"Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step.","authors":"Ellen Wright Clayton, Susannah Rose, Camille Nebecker, Laurie Novak, Yael Bensoussan, You Chen, Benjamin X Collins, Ashley Cordes, Barbara J Evans, Kadija S Ferryman, Samantha Hurst, Xiaoqian Jiang, Aaron Y Lee, Shannon McWeeney, Jillian Parker, Jean-Christophe Bélisle-Pipon, Eric Rosenthal, Zhijun Yin, Joseph Yracheta, Bradley Adam Malin","doi":"10.1093/jamiaopen/ooaf134","DOIUrl":"10.1093/jamiaopen/ooaf134","url":null,"abstract":"<p><strong>Objectives: </strong>The NIH's Bridge2AI Program has funded 4 \"new flagship biomedical and behavioral datasets that are properly documented and ready for use with AI [artificial intelligence] or ML [machine learning] technologies\" to promote the adoption of AI. This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use.</p><p><strong>Materials and methods: </strong>We outline major steps involved in creating and using these datasets in ethically acceptable ways, including (1) data selection-what data are being selected and why, (2) increasing attention to public concerns, (3) the role of participant consent depending on data source, (4) ensuring responsible use, (5) where and how data are stored, (6) what control participants have over data sharing, (7) data access, and (8) data download.</p><p><strong>Results: </strong>We discuss ethical, legal, social, and practical challenges raised at each step of creating AI-ready datasets, noting the importance of addressing issues of future data storage and use. We identify some of the many choices that these projects have made, including how to incorporate public input, where to store data, and defining criteria for access to and downloading data.</p><p><strong>Discussion: </strong>The processes involved in the establishment and governance of the Bridge2AI datasets vary widely but have common elements, suggesting opportunities for future programs to lean upon Bridge2AI strategies.</p><p><strong>Conclusions: </strong>This article discusses the challenges and lessons learned in data collection and governance to ensure their responsible use, particularly as confronted by the 4 distinct projects funded by this program.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf134"},"PeriodicalIF":3.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}