首页 > 最新文献

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science最新文献

英文 中文
Best of Both Worlds: Bridging One Model for All and Group-Specific Model Approaches using Ensemble-based Subpopulation Modeling. 两全其美:利用基于集合的子群体建模,将 "一个模型适用于所有群体 "和 "特定群体模型 "方法结合起来。
Purity Mugambi, Stephanie Carreiro

Subpopulation models have become of increasing interest in prediction of clinical outcomes because they promise to perform better for underrepresented patient subgroups. However, the personalization benefits gained from these models tradeoff their statistical power, and can be impractical when the subpopulation's sample size is small. We hypothesize that a hierarchical model in which population information is integrated into subpopulation models would preserve the personalization benefits and offset the loss of power. In this work, we integrate ideas from ensemble modeling, personalization, and hierarchical modeling and build ensemble-based subpopulation models in which specialization relies on whole group samples. This approach significantly improves the precision of the positive class, especially for the underrepresented subgroups, with minimal cost to the recall. It consistently outperforms one model for all and one model for each subgroup approaches, especially in the presence of a high class-imbalance, for subgroups with at least 380 training samples.

亚群模型在预测临床结果方面越来越受到关注,因为它们有望为代表性不足的患者亚群提供更好的服务。然而,从这些模型中获得的个性化优势折损了它们的统计能力,而且当亚人群样本量较小时,这些模型可能并不实用。我们假设,将群体信息整合到亚群体模型中的分层模型将保留个性化优势,并抵消统计能力的损失。在这项工作中,我们整合了集合建模、个性化和分层建模的思想,建立了基于集合的子群模型,其中的专业化依赖于整个群体样本。这种方法大大提高了正向类的精确度,尤其是对于代表性不足的子群,而召回率的代价却很小。对于至少有 380 个训练样本的子群来说,它的效果始终优于一个模型适用于所有子群和一个模型适用于每个子群的方法,尤其是在存在高度类不平衡的情况下。
{"title":"Best of Both Worlds: Bridging One Model for All and Group-Specific Model Approaches using Ensemble-based Subpopulation Modeling.","authors":"Purity Mugambi, Stephanie Carreiro","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Subpopulation models have become of increasing interest in prediction of clinical outcomes because they promise to perform better for underrepresented patient subgroups. However, the personalization benefits gained from these models tradeoff their statistical power, and can be impractical when the subpopulation's sample size is small. We hypothesize that a hierarchical model in which population information is integrated into subpopulation models would preserve the personalization benefits and offset the loss of power. In this work, we integrate ideas from ensemble modeling, personalization, and hierarchical modeling and build ensemble-based subpopulation models in which specialization relies on whole group samples. This approach significantly improves the precision of the positive class, especially for the underrepresented subgroups, with minimal cost to the recall. It consistently outperforms one model for all and one model for each subgroup approaches, especially in the presence of a high class-imbalance, for subgroups with at least 380 training samples.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"354-363"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring and Reducing Racial Bias in a Pediatric Urinary Tract Infection Model. 在小儿尿路感染模型中测量并减少种族偏见。
Joshua W Anderson, Nader Shaikh, Shyam Visweswaran

Clinical predictive models that include race as a predictor have the potential to exacerbate disparities in healthcare. Such models can be respecified to exclude race or optimized to reduce racial bias. We investigated the impact of such respecifications in a predictive model - UTICalc - which was designed to reduce catheterizations in young children with suspected urinary tract infections. To reduce racial bias, race was removed from the UTICalc logistic regression model and replaced with two new features. We compared the two versions of UTICalc using fairness and predictive performance metrics to understand the effects on racial bias. In addition, we derived three new models for UTICalc to specifically improve racial fairness. Our results show that, as predicted by previously described impossibility results, fairness cannot be simultaneously improved on all fairness metrics, and model respecification may improve racial fairness but decrease overall predictive performance.

将种族作为预测因素的临床预测模型有可能加剧医疗保健中的差异。可以对此类模型进行重新设计,排除种族因素,或对其进行优化,以减少种族偏见。我们在一个预测模型--UTICalc--中研究了这种重新设计的影响,该模型旨在减少疑似尿路感染的幼儿导管插入术。为了减少种族偏差,UTICalc 逻辑回归模型中删除了种族,代之以两个新特征。我们使用公平性和预测性能指标对两个版本的UTICalc进行了比较,以了解对种族偏见的影响。此外,我们还为UTICalc 建立了三个新模型,以专门改善种族公平性。我们的结果表明,正如之前描述的不可能性结果所预测的那样,公平性不可能在所有公平性指标上同时得到改善,模型的重新设计可能会改善种族公平性,但会降低整体预测性能。
{"title":"Measuring and Reducing Racial Bias in a Pediatric Urinary Tract Infection Model.","authors":"Joshua W Anderson, Nader Shaikh, Shyam Visweswaran","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Clinical predictive models that include race as a predictor have the potential to exacerbate disparities in healthcare. Such models can be respecified to exclude race or optimized to reduce racial bias. We investigated the impact of such respecifications in a predictive model - UTICalc - which was designed to reduce catheterizations in young children with suspected urinary tract infections. To reduce racial bias, race was removed from the UTICalc logistic regression model and replaced with two new features. We compared the two versions of UTICalc using fairness and predictive performance metrics to understand the effects on racial bias. In addition, we derived three new models for UTICalc to specifically improve racial fairness. Our results show that, as predicted by previously described impossibility results, fairness cannot be simultaneously improved on all fairness metrics, and model respecification may improve racial fairness but decrease overall predictive performance.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"488-497"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clarifying Chronic Obstructive Pulmonary Disease Genetic Associations Observed in Biobanks via Mediation Analysis of Smoking. 通过对吸烟的中介分析澄清生物库中观察到的慢性阻塞性肺病遗传关联。
Katrina Bazemore, Jaehyun Joo, Wei-Ting Hwang, Blanca E Himes

Varying case definitions of COPD have heterogenous genetic risk profiles, potentially reflective of disease subtypes or classification bias (e.g., smokers more likely to be diagnosed with COPD). To better understand differences in genetic loci associated with ICD-defined versus spirometry-defined COPD we contrasted their GWAS results with those for heavy smoking among 337,138 UK Biobank participants. Overlapping risk loci were found in/near the genes ZEB2, FAM136B, CHRNA3, and CHRNA4, with the CHRNA3 locus shared across all three traits. Mediation analysis to estimate the effects of lead genotyped variants mediated by smoking found significant indirect effects for the FAM136B, CHRNA3, and CHRNA4 loci for both COPD definitions. Adjustment for mediator-outcome confounders modestly attenuated indirect effects, though in the CHRNA4 locus for spirometry-defined COPD the proportion mediated increased an additional 8.47%. Our results suggest that differences between ICD-defined and spirometry-defined COPD associated genetic loci are not a result of smoking biasing classification.

不同病例定义的慢性阻塞性肺病具有不同的遗传风险特征,这可能反映了疾病亚型或分类偏差(例如,吸烟者更有可能被诊断为慢性阻塞性肺病)。为了更好地了解与 ICD 定义的慢性阻塞性肺病相关的遗传位点与肺活量测定定义的慢性阻塞性肺病相关的遗传位点之间的差异,我们将其 GWAS 结果与 337 138 名英国生物库参与者中重度吸烟者的 GWAS 结果进行了对比。在 ZEB2、FAM136B、CHRNA3 和 CHRNA4 基因中/附近发现了重叠的风险基因位点,其中 CHRNA3 基因位点在所有三个性状中共享。通过中介分析来估计由吸烟介导的铅基因分型变异的影响,发现 FAM136B、CHRNA3 和 CHRNA4 基因座对两个慢性阻塞性肺病定义都有显著的间接影响。对介导因素-结果混杂因素的调整适度减弱了间接效应,但在CHRNA4位点上,对于肺活量测定定义的慢性阻塞性肺病,介导的比例额外增加了8.47%。我们的研究结果表明,ICD 定义的慢性阻塞性肺病与肺活量测定定义的慢性阻塞性肺病相关基因位点之间的差异并不是吸烟导致分类偏差的结果。
{"title":"Clarifying Chronic Obstructive Pulmonary Disease Genetic Associations Observed in Biobanks via Mediation Analysis of Smoking.","authors":"Katrina Bazemore, Jaehyun Joo, Wei-Ting Hwang, Blanca E Himes","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Varying case definitions of COPD have heterogenous genetic risk profiles, potentially reflective of disease subtypes or classification bias (e.g., smokers more likely to be diagnosed with COPD). To better understand differences in genetic loci associated with ICD-defined versus spirometry-defined COPD we contrasted their GWAS results with those for heavy smoking among 337,138 UK Biobank participants. Overlapping risk loci were found in/near the genes ZEB2, FAM136B, CHRNA3, and CHRNA4, with the CHRNA3 locus shared across all three traits. Mediation analysis to estimate the effects of lead genotyped variants mediated by smoking found significant indirect effects for the FAM136B, CHRNA3, and CHRNA4 loci for both COPD definitions. Adjustment for mediator-outcome confounders modestly attenuated indirect effects, though in the CHRNA4 locus for spirometry-defined COPD the proportion mediated increased an additional 8.47%. Our results suggest that differences between ICD-defined and spirometry-defined COPD associated genetic loci are not a result of smoking biasing classification.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"499-508"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PFERM: A Fair Empirical Risk Minimization Approach with Prior Knowledge. PFERM:有先验知识的公平经验风险最小化方法。
Bojian Hou, Andrés Mondragón, Davoud Ataee Tarzanagh, Zhuoping Zhou, Andrew J Saykin, Jason H Moore, Marylyn D Ritchie, Qi Long, Li Shen

Fairness is crucial in machine learning to prevent bias based on sensitive attributes in classifier predictions. However, the pursuit of strict fairness often sacrifices accuracy, particularly when significant prevalence disparities exist among groups, making classifiers less practical. For example, Alzheimer's disease (AD) is more prevalent in women than men, making equal treatment inequitable for females. Accounting for prevalence ratios among groups is essential for fair decision-making. In this paper, we introduce prior knowledge for fairness, which incorporates prevalence ratio information into the fairness constraint within the Empirical Risk Minimization (ERM) framework. We develop the Prior-knowledge-guided Fair ERM (PFERM) framework, aiming to minimize expected risk within a specified function class while adhering to a prior-knowledge-guided fairness constraint. This approach strikes a flexible balance between accuracy and fairness. Empirical results confirm its effectiveness in preserving fairness without compromising accuracy.

在机器学习中,公平性对于防止分类器预测中基于敏感属性的偏差至关重要。然而,追求严格的公平性往往会牺牲准确性,尤其是当群体间存在显著的患病率差异时,分类器的实用性就会大打折扣。例如,阿尔茨海默病(AD)在女性中的发病率高于男性,因此平等对待女性是不公平的。考虑群体间的患病率比率对于公平决策至关重要。在本文中,我们引入了公平性先验知识,将患病率信息纳入经验风险最小化(ERM)框架的公平性约束中。我们开发了先验知识指导的公平 ERM(PFERM)框架,旨在最小化指定函数类别内的预期风险,同时遵守先验知识指导的公平性约束。这种方法在准确性和公平性之间取得了灵活的平衡。实证结果证实了它在保持公平性的同时不影响准确性的有效性。
{"title":"PFERM: A Fair Empirical Risk Minimization Approach with Prior Knowledge.","authors":"Bojian Hou, Andrés Mondragón, Davoud Ataee Tarzanagh, Zhuoping Zhou, Andrew J Saykin, Jason H Moore, Marylyn D Ritchie, Qi Long, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Fairness is crucial in machine learning to prevent bias based on sensitive attributes in classifier predictions. However, the pursuit of strict fairness often sacrifices accuracy, particularly when significant prevalence disparities exist among groups, making classifiers less practical. For example, Alzheimer's disease (AD) is more prevalent in women than men, making equal treatment inequitable for females. Accounting for prevalence ratios among groups is essential for fair decision-making. In this paper, we introduce prior knowledge for fairness, which incorporates prevalence ratio information into the fairness constraint within the Empirical Risk Minimization (ERM) framework. We develop the Prior-knowledge-guided Fair ERM (PFERM) framework, aiming to minimize expected risk within a specified function class while adhering to a prior-knowledge-guided fairness constraint. This approach strikes a flexible balance between accuracy and fairness. Empirical results confirm its effectiveness in preserving fairness without compromising accuracy.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"211-220"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated HIV Case Identification from the MIMIC-IV Database. 从 MIMIC-IV 数据库自动识别艾滋病病例。
Kai Jiang, Tru Cao

Automatic HIV phenotyping is needed for HIV research based on electronic health records (EHRs). MIMIC-IV, an extension of MIMIC-III, contains more than 520,000 hospital admissions and has become a valuable EHR database for secondary medical research. However, there was no prior phenotyping algorithm to extract HIV cases from MIMIC-IV, which requires a comprehensive knowledge of the database. Moreover, previous HIV phenotyping algorithms did not consider the new HIV-1/HIV-2 antibody differentiation immunoassay tests that MIMIC-IV contains. Our work provided insight into the structure and data elements in MIMIC-IV and proposed a new HIV phenotyping algorithm to fill in these gaps. The results included MIMIC-IV's data tables and elements used, 1,781 and 1,843 HIV cases from MIMIC-IV's versions 0.4 and 2.1, respectively, and summary statistics of these two HIV case cohorts. They could be used for the development of statistical and machine learning models in future studies about the disease.

基于电子健康记录(EHR)的 HIV 研究需要自动进行 HIV 表型分析。MIMIC-IV 是 MIMIC-III 的延伸,包含 52 万多个住院病例,已成为二次医学研究的重要电子病历数据库。然而,以前没有表型算法从 MIMIC-IV 中提取 HIV 病例,这需要对数据库有全面的了解。此外,以前的 HIV 表型分析算法没有考虑到 MIMIC-IV 所包含的新 HIV-1/HIV-2 抗体分化免疫测定。我们的研究深入了解了 MIMIC-IV 的结构和数据元素,并提出了一种新的 HIV 表型分析算法来填补这些空白。研究结果包括 MIMIC-IV 的数据表和所使用的元素、MIMIC-IV 0.4 和 2.1 版本中分别包含的 1,781 和 1,843 个 HIV 病例,以及这两个 HIV 病例队列的汇总统计数据。这些数据可用于在今后的疾病研究中开发统计和机器学习模型。
{"title":"Automated HIV Case Identification from the MIMIC-IV Database.","authors":"Kai Jiang, Tru Cao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Automatic HIV phenotyping is needed for HIV research based on electronic health records (EHRs). MIMIC-IV, an extension of MIMIC-III, contains more than 520,000 hospital admissions and has become a valuable EHR database for secondary medical research. However, there was no prior phenotyping algorithm to extract HIV cases from MIMIC-IV, which requires a comprehensive knowledge of the database. Moreover, previous HIV phenotyping algorithms did not consider the new HIV-1/HIV-2 antibody differentiation immunoassay tests that MIMIC-IV contains. Our work provided insight into the structure and data elements in MIMIC-IV and proposed a new HIV phenotyping algorithm to fill in these gaps. The results included MIMIC-IV's data tables and elements used, 1,781 and 1,843 HIV cases from MIMIC-IV's versions 0.4 and 2.1, respectively, and summary statistics of these two HIV case cohorts. They could be used for the development of statistical and machine learning models in future studies about the disease.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"555-564"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Cancer Clinical Trials to their Result Publications. 将癌症临床试验与其结果出版物联系起来。
Evan Pan, Kirk Roberts

The results of clinical trials are a valuable source of evidence for researchers, policy makers, and healthcare professionals. However, online trial registries do not always contain links to the publications that report on their results, instead requiring a time-consuming manual search. Here, we explored the application of pre-trained transformer-based language models to automatically identify result-reporting publications of cancer clinical trials by computing dense vectors and performing semantic search. Models were fine-tuned on text data from trial registry fields and article metadata using a contrastive learning approach. The best performing model was PubMedBERT, which achieved a mean average precision of 0.592 and ranked 70.3% of a trial's publications in the top 5 results when tested on the holdout test trials. Our results suggest that semantic search using embeddings from transformer models may be an effective approach to the task of linking trials to their publications.

临床试验结果是研究人员、政策制定者和医疗保健专业人员的宝贵证据来源。然而,在线试验登记并不总是包含报告试验结果的出版物链接,而是需要耗时的人工搜索。在此,我们探索了如何应用预先训练好的基于转换器的语言模型,通过计算密集向量和执行语义搜索来自动识别癌症临床试验的结果报告出版物。我们采用对比学习法对来自试验登记栏和文章元数据的文本数据对模型进行了微调。表现最好的模型是PubMedBERT,它的平均精确度达到了0.592,在对保留试验进行测试时,70.3%的试验出版物排在了前5名。我们的研究结果表明,使用转换器模型的嵌入进行语义搜索可能是将试验与其出版物联系起来的有效方法。
{"title":"Linking Cancer Clinical Trials to their Result Publications.","authors":"Evan Pan, Kirk Roberts","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The results of clinical trials are a valuable source of evidence for researchers, policy makers, and healthcare professionals. However, online trial registries do not always contain links to the publications that report on their results, instead requiring a time-consuming manual search. Here, we explored the application of pre-trained transformer-based language models to automatically identify result-reporting publications of cancer clinical trials by computing dense vectors and performing semantic search. Models were fine-tuned on text data from trial registry fields and article metadata using a contrastive learning approach. The best performing model was PubMedBERT, which achieved a mean average precision of 0.592 and ranked 70.3% of a trial's publications in the top 5 results when tested on the holdout test trials. Our results suggest that semantic search using embeddings from transformer models may be an effective approach to the task of linking trials to their publications.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"642-651"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLASSify: A Web-Based Tool for Machine Learning. CLASSify:基于网络的机器学习工具。
Aaron D Mullen, Samuel E Armstrong, Jeff Talbert, V K Cody Bumgardner

Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.

机器学习分类问题在生物信息学中非常普遍,但进行模型训练、优化和推理所需的技术知识可能会阻碍研究人员利用这一技术。本文介绍了一种用于机器学习分类问题的自动化工具,以简化训练模型和生成结果的过程,同时提供信息丰富的可视化效果和对数据的深入了解。该工具支持二元分类和多分类问题,并提供多种模型和方法。可在界面中生成合成数据,以填补缺失值、平衡类标签或生成全新的数据集。它还为特征评估提供支持,并生成可解释性分数,以显示哪些特征对输出影响最大。我们介绍的 CLASSify 是一款开源工具,可简化用户解决分类问题的体验,无需机器学习知识。
{"title":"CLASSify: A Web-Based Tool for Machine Learning.","authors":"Aaron D Mullen, Samuel E Armstrong, Jeff Talbert, V K Cody Bumgardner","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"364-373"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mechanisms for Integrating Real Data into Search Game Simulations: An Application to Winter Health Service Pressures and Preventative Policies. 将真实数据纳入搜索游戏模拟的机制:应用于冬季卫生服务压力和预防政策。
Martin Chapman, Abigail G-Medhin, Kian Daneshi, Tom Bramwell, Stevo Durbaba, Vasa Curcin, Divya Parmar, Harriet Boulding, Laia Becares, Craig Morgan, Mariam Molokhia, Peter McBurney, Seeromanie Harding, Ingrid Wolfe, Mark Ashworth, Lucilla Poston

While modelling and simulation are powerful techniques for exploring complex phenomena, if they are not coupled with suitable real-world data any results obtained are likely to require extensive validation. We consider this problem in the context of search game modelling, and suggest that both demographic and behaviour data are used to configure certain model parameters. We show this integration in practice by using a combined dataset of over 150,000 individuals to configure a specific search game model that captures the environment, population, interventions and individual behaviours relating to winter health service pressures. The presence of this data enables us to more accurately explore the potential impact of service pressure interventions, which we do across 33,000 simulations using a computational version of the model. We find government advice to be the best-performing intervention in simulation, in respect of improved health, reduced health inequalities, and thus reduced pressure on health service utilisation.

虽然建模和模拟是探索复杂现象的强大技术,但如果不与合适的真实世界数据相结合,所获得的任何结果都可能需要大量的验证。我们在搜索游戏建模中考虑了这一问题,并建议使用人口和行为数据来配置某些模型参数。我们在实践中使用超过 15 万人的综合数据集来配置特定的搜索博弈模型,该模型捕捉了与冬季医疗服务压力相关的环境、人口、干预措施和个人行为,从而展示了这种整合。有了这些数据,我们就能更准确地探索服务压力干预措施的潜在影响,我们使用该模型的计算版本进行了 33000 次模拟。我们发现,在改善健康状况、减少健康不平等从而减轻医疗服务使用压力方面,政府建议是模拟中效果最好的干预措施。
{"title":"Mechanisms for Integrating Real Data into Search Game Simulations: An Application to Winter Health Service Pressures and Preventative Policies.","authors":"Martin Chapman, Abigail G-Medhin, Kian Daneshi, Tom Bramwell, Stevo Durbaba, Vasa Curcin, Divya Parmar, Harriet Boulding, Laia Becares, Craig Morgan, Mariam Molokhia, Peter McBurney, Seeromanie Harding, Ingrid Wolfe, Mark Ashworth, Lucilla Poston","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>While modelling and simulation are powerful techniques for exploring complex phenomena, if they are not coupled with suitable real-world data any results obtained are likely to require extensive validation. We consider this problem in the context of search game modelling, and suggest that both demographic and behaviour data are used to configure certain model parameters. We show this integration in practice by using a combined dataset of over 150,000 individuals to configure a specific search game model that captures the environment, population, interventions and individual behaviours relating to winter health service pressures. The presence of this data enables us to more accurately explore the potential impact of service pressure interventions, which we do across 33,000 simulations using a computational version of the model. We find government advice to be the best-performing intervention in simulation, in respect of improved health, reduced health inequalities, and thus reduced pressure on health service utilisation.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"115-124"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time Obstructive Sleep Apnea Detection from Raw ECG and SpO2 Signal Using Convolutional Neural Network. 利用卷积神经网络从原始心电图和 SpO2 信号实时检测阻塞性睡眠呼吸暂停。
Tanmoy Paul, Omiya Hassan, Syed K Islam, Abu S M Mosa

Obstructive sleep apnea is a sleep disorder that is linked with many health complications and severe form of apnea can even be lethal. Overnight polysomnography is the gold standard for diagnosing apnea, which is expensive, time-consuming, and requires manual analysis by a sleep expert. Recently, there have been numerous studies demonstrating the application of artificial intelligence to detect apnea in real time. But the majority of these studies apply data pre-processing and feature extraction techniques resulting in a longer inference time that makes the real-time detection system inefficient. This study proposes a single convolutional neural network architecture that can automatically extract spatial features and detect apnea from both electrocardiogram (ECG) and blood-oxygen saturation (SpO2) signals. Using segments of 10s, the network classified apnea with an accuracy of 94.2% and 96% for ECG and SpO2 respectively. Moreover, the overall performance of both models was consistent with an AUC score of 0.99.

阻塞性睡眠呼吸暂停是一种睡眠障碍,与许多健康并发症有关,严重的呼吸暂停甚至可以致命。通宵多导睡眠图是诊断呼吸暂停的黄金标准,但这种方法昂贵、耗时,而且需要睡眠专家进行人工分析。最近有许多研究表明,人工智能可用于实时检测呼吸暂停。但这些研究大多采用数据预处理和特征提取技术,推理时间较长,导致实时检测系统效率低下。本研究提出了一种单一卷积神经网络架构,可从心电图(ECG)和血氧饱和度(SpO2)信号中自动提取空间特征并检测呼吸暂停。利用 10 秒的片段,该网络对心电图和 SpO2 信号进行呼吸暂停分类的准确率分别为 94.2% 和 96%。此外,两个模型的总体性能一致,AUC 得分为 0.99。
{"title":"Real-Time Obstructive Sleep Apnea Detection from Raw ECG and SpO<sub>2</sub> Signal Using Convolutional Neural Network.","authors":"Tanmoy Paul, Omiya Hassan, Syed K Islam, Abu S M Mosa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Obstructive sleep apnea is a sleep disorder that is linked with many health complications and severe form of apnea can even be lethal. Overnight polysomnography is the gold standard for diagnosing apnea, which is expensive, time-consuming, and requires manual analysis by a sleep expert. Recently, there have been numerous studies demonstrating the application of artificial intelligence to detect apnea in real time. But the majority of these studies apply data pre-processing and feature extraction techniques resulting in a longer inference time that makes the real-time detection system inefficient. This study proposes a single convolutional neural network architecture that can automatically extract spatial features and detect apnea from both electrocardiogram (ECG) and blood-oxygen saturation (SpO<sub>2</sub>) signals. Using segments of 10s, the network classified apnea with an accuracy of 94.2% and 96% for ECG and SpO<sub>2</sub> respectively. Moreover, the overall performance of both models was consistent with an AUC score of 0.99.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"662-669"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141842/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA. 从甲状腺手术叙述中自动提取信息:GPT-4 与微调 KoELECTRA 的比较研究。
Dongsuk Jang, Hyeryun Park, Jiye Son, Hyeonuk Hwang, Su-Jin Kim, Jinwook Choi

In the rapidly evolving field of healthcare, the integration of artificial intelligence (AI) has become a pivotal component in the automation of clinical workflows, ushering in a new era of efficiency and accuracy. This study focuses on the transformative capabilities of the fine-tuned KoELECTRA model in comparison to the GPT-4 model, aiming to facilitate automated information extraction from thyroid operation narratives. The current research landscape is dominated by traditional methods heavily reliant on regular expressions, which often face challenges in processing free-style text formats containing critical details of operation records, including frozen biopsy reports. Addressing this, the study leverages advanced natural language processing (NLP) techniques to foster a paradigm shift towards more sophisticated data processing systems. Through this comparative study, we aspire to unveil a more streamlined, precise, and efficient approach to document processing in the healthcare domain, potentially revolutionizing the way medical data is handled and analyzed.

在快速发展的医疗保健领域,人工智能(AI)的整合已成为临床工作流程自动化的关键组成部分,并将迎来一个高效、准确的新时代。本研究重点关注微调后的 KoELECTRA 模型与 GPT-4 模型相比所具有的变革能力,旨在促进从甲状腺手术叙述中自动提取信息。目前的研究领域主要采用严重依赖正则表达式的传统方法,这些方法在处理包含手术记录(包括冰冻活检报告)关键细节的自由文本格式时往往面临挑战。为解决这一问题,本研究利用先进的自然语言处理(NLP)技术,促进向更复杂的数据处理系统的范式转变。通过这项比较研究,我们希望在医疗保健领域推出一种更精简、更精确、更高效的文档处理方法,从而彻底改变医疗数据的处理和分析方式。
{"title":"Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA.","authors":"Dongsuk Jang, Hyeryun Park, Jiye Son, Hyeonuk Hwang, Su-Jin Kim, Jinwook Choi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In the rapidly evolving field of healthcare, the integration of artificial intelligence (AI) has become a pivotal component in the automation of clinical workflows, ushering in a new era of efficiency and accuracy. This study focuses on the transformative capabilities of the fine-tuned KoELECTRA model in comparison to the GPT-4 model, aiming to facilitate automated information extraction from thyroid operation narratives. The current research landscape is dominated by traditional methods heavily reliant on regular expressions, which often face challenges in processing free-style text formats containing critical details of operation records, including frozen biopsy reports. Addressing this, the study leverages advanced natural language processing (NLP) techniques to foster a paradigm shift towards more sophisticated data processing systems. Through this comparative study, we aspire to unveil a more streamlined, precise, and efficient approach to document processing in the healthcare domain, potentially revolutionizing the way medical data is handled and analyzed.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2024 ","pages":"249-257"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1