Classical machine learning and deep learning models for Computer-Aided Diagnosis (CAD) commonly focus on overall classification performance, treating misclassification errors (false negatives and false positives) equally during training. This uniform treatment overlooks the distinct costs associated with each type of error, leading to suboptimal decision-making, particularly in the medical domain where it is important to improve the prediction sensitivity without significantly compromising overall accuracy. This study introduces a novel deep learning-based CAD system that incorporates a cost-sensitive parameter into the activation function. By applying our methodologies to two medical imaging datasets, our proposed study shows statistically significant increases of 3.84% and 5.4% in sensitivity while maintaining overall accuracy for Lung Image Database Consortium (LIDC) and Breast Cancer Histological Database (BreakHis), respectively. Our findings underscore the significance of integrating cost-sensitive parameters into future CAD systems to optimize performance and ultimately reduce costs and improve patient outcomes.
{"title":"Optimizing Computer-Aided Diagnosis with Cost-Aware Deep Learning Models.","authors":"Charmi Patel, Yiyang Wang, Thiruvarangan Ramaraj, Roselyne Tchoua, Jacob Furst, Daniela Raicu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Classical machine learning and deep learning models for Computer-Aided Diagnosis (CAD) commonly focus on overall classification performance, treating misclassification errors (false negatives and false positives) equally during training. This uniform treatment overlooks the distinct costs associated with each type of error, leading to suboptimal decision-making, particularly in the medical domain where it is important to improve the prediction sensitivity without significantly compromising overall accuracy. This study introduces a novel deep learning-based CAD system that incorporates a cost-sensitive parameter into the activation function. By applying our methodologies to two medical imaging datasets, our proposed study shows statistically significant increases of 3.84% and 5.4% in sensitivity while maintaining overall accuracy for Lung Image Database Consortium (LIDC) and Breast Cancer Histological Database (BreakHis), respectively. Our findings underscore the significance of integrating cost-sensitive parameters into future CAD systems to optimize performance and ultimately reduce costs and improve patient outcomes.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"108-119"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inyoung Jun, Sarah E Ser, Scott A Cohen, Jie Xu, Robert J Lucero, Jiang Bian, Mattia Prosperi
This study quantifies health outcome disparities in invasive Methicillin-Resistant Staphylococcus aureus (MRSA) infections by leveraging a novel artificial intelligence (AI) fairness algorithm, the Fairness-Aware Causal paThs (FACTS) decomposition, and applying it to real-world electronic health record (EHR) data. We spatiotemporally linked 9 years of EHRs from a large healthcare provider in Florida, USA, with contextual social determinants of health (SDoH). We first created a causal structure graph connecting SDoH with individual clinical measurements before/upon diagnosis of invasive MRSA infection, treatments, side effects, and outcomes; then, we applied FACTS to quantify outcome potential disparities of different causal pathways including SDoH, clinical and demographic variables. We found moderate disparity with respect to demographics and SDoH, and all the top ranked pathways that led to outcome disparities in age, gender, race, and income, included comorbidity. Prior kidney impairment, vancomycin use, and timing were associated with racial disparity, while income, rurality, and available healthcare facilities contributed to gender disparity. From an intervention standpoint, our results highlight the necessity of devising policies that consider both clinical factors and SDoH. In conclusion, this work demonstrates a practical utility of fairness AI methods in public health settings.
{"title":"Quantifying Health Outcome Disparity in Invasive Methicillin-Resistant Staphylococcus aureus Infection using Fairness Algorithms on Real-World Data.","authors":"Inyoung Jun, Sarah E Ser, Scott A Cohen, Jie Xu, Robert J Lucero, Jiang Bian, Mattia Prosperi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This study quantifies health outcome disparities in invasive Methicillin-Resistant Staphylococcus aureus (MRSA) infections by leveraging a novel artificial intelligence (AI) fairness algorithm, the Fairness-Aware Causal paThs (FACTS) decomposition, and applying it to real-world electronic health record (EHR) data. We spatiotemporally linked 9 years of EHRs from a large healthcare provider in Florida, USA, with contextual social determinants of health (SDoH). We first created a causal structure graph connecting SDoH with individual clinical measurements before/upon diagnosis of invasive MRSA infection, treatments, side effects, and outcomes; then, we applied FACTS to quantify outcome potential disparities of different causal pathways including SDoH, clinical and demographic variables. We found moderate disparity with respect to demographics and SDoH, and all the top ranked pathways that led to outcome disparities in age, gender, race, and income, included comorbidity. Prior kidney impairment, vancomycin use, and timing were associated with racial disparity, while income, rurality, and available healthcare facilities contributed to gender disparity. From an intervention standpoint, our results highlight the necessity of devising policies that consider both clinical factors and SDoH. In conclusion, this work demonstrates a practical utility of fairness AI methods in public health settings.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"419-432"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of microdissection of heterogeneous tissue samples is of great interest for both fundamental biology and biomedical research. Until now, microdissection in the form of supervised deconvolution of mixed sequencing samples has been limited to assays measuring gene expression (RNA-seq) or chromatin accessibility (ATAC-seq). We present here the first attempt at solving the supervised deconvolution problem for run-on nascent sequencing data (GRO-seq and PRO-seq), a readout of active transcription. Then, we develop a novel filtering method suited to the mixed set of promoter and enhancer regions provided by nascent sequencing, and apply best-practice standards from the RNA-seq literature, using in-silico mixtures of cells. Using these methods, we find that enhancer RNAs are highly informative features for supervised deconvolution. In most cases, simple deconvolution methods perform better than more complex ones for solving the nascent deconvolution problem. Furthermore, undifferentiated cell types confound deconvolution of nascent sequencing data, likely as a consequence of transcriptional activity over the highly open chromatin regions of undifferentiated cell types. Our results suggest that while the problem of nascent deconvolution is generally tractable, stronger approaches integrating other sequencing protocols may be required to solve mixtures containing undifferentiated celltypes.
{"title":"Deconvolution of Nascent Sequencing Data Using Transcriptional Regulatory Elements.","authors":"Zachary Maas, Rutendo Sigauke, Robin Dowell","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The problem of microdissection of heterogeneous tissue samples is of great interest for both fundamental biology and biomedical research. Until now, microdissection in the form of supervised deconvolution of mixed sequencing samples has been limited to assays measuring gene expression (RNA-seq) or chromatin accessibility (ATAC-seq). We present here the first attempt at solving the supervised deconvolution problem for run-on nascent sequencing data (GRO-seq and PRO-seq), a readout of active transcription. Then, we develop a novel filtering method suited to the mixed set of promoter and enhancer regions provided by nascent sequencing, and apply best-practice standards from the RNA-seq literature, using in-silico mixtures of cells. Using these methods, we find that enhancer RNAs are highly informative features for supervised deconvolution. In most cases, simple deconvolution methods perform better than more complex ones for solving the nascent deconvolution problem. Furthermore, undifferentiated cell types confound deconvolution of nascent sequencing data, likely as a consequence of transcriptional activity over the highly open chromatin regions of undifferentiated cell types. Our results suggest that while the problem of nascent deconvolution is generally tractable, stronger approaches integrating other sequencing protocols may be required to solve mixtures containing undifferentiated celltypes.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"564-578"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein-protein interactions play an essential role in nearly all biological processes, and it has become increasingly clear that in order to better understand the fundamental processes that underlie disease, we must develop a strong understanding of both their context specificity (e.g., tissue-specificity) as well as their dynamic nature (e.g., how they respond to environmental changes). While network-based approaches have found much initial success in the application of protein-protein interactions (PPIs) towards systems-level explorations of biology, they often overlook the fact that large numbers of proteins undergo alternative splicing. Alternative splicing has not only been shown to diversify protein function through the generation of multiple protein isoforms, but also remodel PPIs and affect a wide range diseases, including cancer. Isoform-specific interactions are not well characterized, so we develop a computational approach that uses domain-domain interactions in concert with differential exon usage data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx). Using this approach, we can characterize PPIs likely disrupted or possibly even increased due to splicing events for individual TCGA cancer patient samples relative to a matched GTEx normal tissue background.
{"title":"Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer.","authors":"Ruth Dannenfelser, Vicky Yao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Protein-protein interactions play an essential role in nearly all biological processes, and it has become increasingly clear that in order to better understand the fundamental processes that underlie disease, we must develop a strong understanding of both their context specificity (e.g., tissue-specificity) as well as their dynamic nature (e.g., how they respond to environmental changes). While network-based approaches have found much initial success in the application of protein-protein interactions (PPIs) towards systems-level explorations of biology, they often overlook the fact that large numbers of proteins undergo alternative splicing. Alternative splicing has not only been shown to diversify protein function through the generation of multiple protein isoforms, but also remodel PPIs and affect a wide range diseases, including cancer. Isoform-specific interactions are not well characterized, so we develop a computational approach that uses domain-domain interactions in concert with differential exon usage data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx). Using this approach, we can characterize PPIs likely disrupted or possibly even increased due to splicing events for individual TCGA cancer patient samples relative to a matched GTEx normal tissue background.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"579-593"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alena Orlenko, Philip J Freda, Attri Ghosh, Hyunjun Choi, Nicholas Matsumoto, Tiffani J Bright, Corey T Walker, Tayo Obafemi-Ajayi, Jason H Moore
This work demonstrates the use of cluster analysis in detecting fair and unbiased novel discoveries. Given a sample population of elective spinal fusion patients, we identify two overarching subgroups driven by insurance type. The Medicare group, associated with lower socioeconomic status, exhibited an over-representation of negative risk factors. The findings provide a compelling depiction of the interwoven socioeconomic and racial disparities present within the healthcare system, highlighting their consequential effects on health inequalities. The results are intended to guide design of fair and precise machine learning models based on intentional integration of population stratification.
{"title":"Cluster Analysis reveals Socioeconomic Disparities among Elective Spine Surgery Patients.","authors":"Alena Orlenko, Philip J Freda, Attri Ghosh, Hyunjun Choi, Nicholas Matsumoto, Tiffani J Bright, Corey T Walker, Tayo Obafemi-Ajayi, Jason H Moore","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This work demonstrates the use of cluster analysis in detecting fair and unbiased novel discoveries. Given a sample population of elective spinal fusion patients, we identify two overarching subgroups driven by insurance type. The Medicare group, associated with lower socioeconomic status, exhibited an over-representation of negative risk factors. The findings provide a compelling depiction of the interwoven socioeconomic and racial disparities present within the healthcare system, highlighting their consequential effects on health inequalities. The results are intended to guide design of fair and precise machine learning models based on intentional integration of population stratification.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"359-373"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11250986/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachel L Kember, Shefali S Verma, Anurag Verma, Brenda Xiao, Anastasia Lucas, Colleen M Kripke, Renae Judy, Jinbo Chen, Scott M Damrauer, Daniel J Rader, Marylyn D Ritchie
Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.
{"title":"Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine.","authors":"Rachel L Kember, Shefali S Verma, Anurag Verma, Brenda Xiao, Anastasia Lucas, Colleen M Kripke, Renae Judy, Jinbo Chen, Scott M Damrauer, Daniel J Rader, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"611-626"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10947742/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data from digital health technologies (DHT), including wearable sensors like Apple Watch, Whoop, Oura Ring, and Fitbit, are increasingly being used in biomedical research. Research and development of DHT-related devices, platforms, and applications is happening rapidly and with significant private-sector involvement with new biotech companies and large tech companies (e.g. Google, Apple, Amazon, Uber) investing heavily in technologies to improve human health. Many academic institutions are building capabilities related to DHT research, often in cross-sector collaboration with technology companies and other organizations with the goal of generating clinically meaningful evidence to improve patient care, to identify users at an earlier stage of disease presentation, and to support health preservation and disease prevention. Large research consortia, cross-sector partnerships, and individual research labs are all represented in the current corpus of published studies. Some of the large research studies, like NIH's All of Us Research Program, make data sets from wearable sensors available to the research community, while the vast majority of data from wearable sensors and other DHTs are held by private sector organizations and are not readily available to the research community. As data are unlocked from the private sector and made available to the academic research community, there is an opportunity to develop innovative analytics and methods through expanded access. This is the second year for this Session which solicited research results leveraging digital health technologies, including wearable sensor data, describing novel analytical methods, and issues related to diversity, equity, inclusion (DEI) of the research, data, and the community of researchers working in this area. We particularly encouraged submissions describing opportunities for expanding and democratizing academic research using data from wearable sensors and related digital health technologies.
来自数字健康技术(DHT)的数据,包括 Apple Watch、Whoop、Oura Ring 和 Fitbit 等可穿戴传感器的数据,正越来越多地被用于生物医学研究。与数字健康技术相关的设备、平台和应用的研究与开发正在快速进行,新兴生物技术公司和大型科技公司(如谷歌、苹果、亚马逊、优步等)大量投资于改善人类健康的技术,私营部门也积极参与其中。许多学术机构正在建设与 DHT 研究相关的能力,通常是与技术公司和其他组织开展跨部门合作,目标是提供有临床意义的证据,以改善患者护理,在疾病的早期阶段识别用户,并支持健康保护和疾病预防。在目前已发表的研究成果中,大型研究联盟、跨部门合作以及单个研究实验室均有体现。一些大型研究,如美国国立卫生研究院的 "我们所有人研究计划",向研究界提供了来自可穿戴传感器的数据集,而来自可穿戴传感器和其他 DHT 的绝大多数数据都由私营部门组织掌握,不能随时向研究界提供。随着数据从私营部门解锁并提供给学术研究界,有机会通过扩大访问范围来开发创新的分析方法和手段。今年是该会议举办的第二年,会议征集了利用数字健康技术(包括可穿戴传感器数据)的研究成果,介绍了新颖的分析方法,以及与该领域的研究、数据和研究人员群体的多样性、公平性和包容性(DEI)相关的问题。我们特别鼓励在提交的论文中描述利用可穿戴传感器和相关数字健康技术的数据扩大学术研究并使之民主化的机会。
{"title":"Session Introduction: Digital health technology data in biocomputing: Research efforts and considerations for expanding access (PSB2024).","authors":"Michelle Holko, Chris Lunt, Jessilyn Dunn","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Data from digital health technologies (DHT), including wearable sensors like Apple Watch, Whoop, Oura Ring, and Fitbit, are increasingly being used in biomedical research. Research and development of DHT-related devices, platforms, and applications is happening rapidly and with significant private-sector involvement with new biotech companies and large tech companies (e.g. Google, Apple, Amazon, Uber) investing heavily in technologies to improve human health. Many academic institutions are building capabilities related to DHT research, often in cross-sector collaboration with technology companies and other organizations with the goal of generating clinically meaningful evidence to improve patient care, to identify users at an earlier stage of disease presentation, and to support health preservation and disease prevention. Large research consortia, cross-sector partnerships, and individual research labs are all represented in the current corpus of published studies. Some of the large research studies, like NIH's All of Us Research Program, make data sets from wearable sensors available to the research community, while the vast majority of data from wearable sensors and other DHTs are held by private sector organizations and are not readily available to the research community. As data are unlocked from the private sector and made available to the academic research community, there is an opportunity to develop innovative analytics and methods through expanded access. This is the second year for this Session which solicited research results leveraging digital health technologies, including wearable sensor data, describing novel analytical methods, and issues related to diversity, equity, inclusion (DEI) of the research, data, and the community of researchers working in this area. We particularly encouraged submissions describing opportunities for expanding and democratizing academic research using data from wearable sensors and related digital health technologies.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"163-169"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah
The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.
{"title":"Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature.","authors":"Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"8-23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brooke L Fridley, Simon Vandekar, Inna Chervoneva, Julia Wrobel, Siyuan Ma
Immune modulation is considered a hallmark of cancer initiation and progression, with immune cell density being consistently associated with clinical outcomes of individuals with cancer. Multiplex immunofluorescence (mIF) microscopy combined with automated image analysis is a novel and increasingly used technique that allows for the assessment and visualization of the tumor microenvironment (TME). Recently, application of this new technology to tissue microarrays (TMAs) or whole tissue sections from large cancer studies has been used to characterize different cell populations in the TME with enhanced reproducibility and accuracy. Generally, mIF data has been used to examine the presence and abundance of immune cells in the tumor and stroma compartments; however, this aggregate measure assumes uniform patterns of immune cells throughout the TME and overlooks spatial heterogeneity. Recently, the spatial contexture of the TME has been explored with a variety of statistical methods. In this PSB workshop, speakers will present some of the state-of-the-art statistical methods for assessing the TIME from mIF data.
免疫调节被认为是癌症发生和发展的一个标志,免疫细胞密度一直与癌症患者的临床预后相关。多重免疫荧光(mIF)显微镜与自动图像分析相结合,是一种新颖的、应用日益广泛的技术,可对肿瘤微环境(TME)进行评估和可视化。最近,将这项新技术应用于组织微阵列(TMA)或大型癌症研究的整个组织切片已被用来描述肿瘤微环境中不同细胞群的特征,并提高了可重复性和准确性。一般来说,mIF 数据被用来检测肿瘤和基质区免疫细胞的存在和丰度;然而,这种综合测量方法假定整个 TME 中免疫细胞的模式是一致的,而忽略了空间异质性。最近,人们利用各种统计方法对肿瘤组织间质的空间背景进行了探索。在本次 PSB 研讨会上,演讲者将介绍从 mIF 数据中评估 TIME 的一些最新统计方法。
{"title":"Statistical analysis of single-cell protein data.","authors":"Brooke L Fridley, Simon Vandekar, Inna Chervoneva, Julia Wrobel, Siyuan Ma","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Immune modulation is considered a hallmark of cancer initiation and progression, with immune cell density being consistently associated with clinical outcomes of individuals with cancer. Multiplex immunofluorescence (mIF) microscopy combined with automated image analysis is a novel and increasingly used technique that allows for the assessment and visualization of the tumor microenvironment (TME). Recently, application of this new technology to tissue microarrays (TMAs) or whole tissue sections from large cancer studies has been used to characterize different cell populations in the TME with enhanced reproducibility and accuracy. Generally, mIF data has been used to examine the presence and abundance of immune cells in the tumor and stroma compartments; however, this aggregate measure assumes uniform patterns of immune cells throughout the TME and overlooks spatial heterogeneity. Recently, the spatial contexture of the TME has been explored with a variety of statistical methods. In this PSB workshop, speakers will present some of the state-of-the-art statistical methods for assessing the TIME from mIF data.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"654-660"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C Cheng, Debiao Li, Shoa L Clarke, David Ouyang
Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.
{"title":"Impact of Measurement Noise on Genetic Association Studies of Cardiac Function.","authors":"Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C Cheng, Debiao Li, Shoa L Clarke, David Ouyang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"134-147"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}