首页 > 最新文献

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献

英文 中文
Scalar-Function Causal Discovery for Generating Causal Hypotheses with Observational Wearable Device Data. 利用观察型可穿戴设备数据生成因果假设的标量函数因果关系发现。
Valeriya Rogovchenko, Austin Sibu, Yang Ni

Digital health technologies such as wearable devices have transformed health data analytics, providing continuous, high-resolution functional data on various health metrics, thereby opening new avenues for innovative research. In this work, we introduce a new approach for generating causal hypotheses for a pair of a continuous functional variable (e.g., physical activities recorded over time) and a binary scalar variable (e.g., mobility condition indicator). Our method goes beyond traditional association-focused approaches and has the potential to reveal the underlying causal mechanism. We theoretically show that the proposed scalar-function causal model is identifiable with observational data alone. Our identifiability theory justifies the use of a simple yet principled algorithm to discern the causal relationship by comparing the likelihood functions of competing causal hypotheses. The robustness and applicability of our method are demonstrated through simulation studies and a real-world application using wearable device data from the National Health and Nutrition Examination Survey.

可穿戴设备等数字健康技术改变了健康数据分析,为各种健康指标提供了连续、高分辨率的功能数据,从而为创新研究开辟了新途径。在这项工作中,我们介绍了一种新方法,用于为一对连续功能变量(如随时间记录的体力活动)和二元标量变量(如行动状况指标)生成因果假设。我们的方法超越了传统的以关联为重点的方法,具有揭示潜在因果机制的潜力。我们从理论上证明,所提出的标量函数因果模型仅凭观察数据就可以识别。我们的可识别性理论证明,通过比较相互竞争的因果假设的似然函数,可以使用一种简单而原则性强的算法来辨别因果关系。我们的方法通过模拟研究和实际应用(使用美国国家健康与营养调查的可穿戴设备数据)证明了其稳健性和适用性。
{"title":"Scalar-Function Causal Discovery for Generating Causal Hypotheses with Observational Wearable Device Data.","authors":"Valeriya Rogovchenko, Austin Sibu, Yang Ni","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Digital health technologies such as wearable devices have transformed health data analytics, providing continuous, high-resolution functional data on various health metrics, thereby opening new avenues for innovative research. In this work, we introduce a new approach for generating causal hypotheses for a pair of a continuous functional variable (e.g., physical activities recorded over time) and a binary scalar variable (e.g., mobility condition indicator). Our method goes beyond traditional association-focused approaches and has the potential to reveal the underlying causal mechanism. We theoretically show that the proposed scalar-function causal model is identifiable with observational data alone. Our identifiability theory justifies the use of a simple yet principled algorithm to discern the causal relationship by comparing the likelihood functions of competing causal hypotheses. The robustness and applicability of our method are demonstrated through simulation studies and a real-world application using wearable device data from the National Health and Nutrition Examination Survey.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"201-213"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Conversational Agent for Early Detection of Neurotoxic Effects of Medications through Automated Intensive Observation. 通过自动强化观察及早发现药物神经毒性效应的对话式代理。
Serguei Pakhomov, Jacob Solinsky, Martin Michalowski, Veronika Bachanova

We present a fully automated AI-based system for intensive monitoring of cognitive symptoms of neurotoxicity that frequently appear as a result of immunotherapy of hematologic malignancies. Early manifestations of these symptoms are evident in the patient's speech in the form of mild aphasia and confusion and can be detected and effectively treated prior to onset of more serious and potentially life-threatening impairment. We have developed the Automated Neural Nursing Assistant (ANNA) system designed to conduct a brief cognitive assessment several times per day over the telephone for 5-14 days following infusion of the immunotherapy medication. ANNA uses a conversational agent based on a large language model to elicit spontaneous speech in a semi-structured dialogue, followed by a series of brief language-based neurocognitive tests. In this paper we share ANNA's design and implementation, results of a pilot functional evaluation study, and discuss technical and logistic challenges facing the introduction of this type of technology in clinical practice. A large-scale clinical evaluation of ANNA will be conducted in an observational study of patients undergoing immunotherapy at the University of Minnesota Masonic Cancer Center starting in the Fall 2023.

我们介绍了一种基于人工智能的全自动系统,用于密集监测血液恶性肿瘤免疫治疗过程中经常出现的神经毒性认知症状。这些症状的早期表现以轻度失语和意识模糊的形式出现在患者的言语中,可以在出现更严重和可能危及生命的损害之前被检测到并得到有效治疗。我们开发了自动神经护理助手(ANNA)系统,旨在通过电话在输注免疫疗法药物后的 5-14 天内每天多次进行简短的认知评估。ANNA 使用基于大型语言模型的对话代理,在半结构化对话中诱导自发言语,然后进行一系列基于语言的简短神经认知测试。在本文中,我们分享了 ANNA 的设计和实施、试点功能评估研究的结果,并讨论了在临床实践中引入此类技术所面临的技术和后勤挑战。从 2023 年秋季开始,明尼苏达大学松下癌症中心将对接受免疫疗法的患者进行观察研究,对 ANNA 进行大规模临床评估。
{"title":"A Conversational Agent for Early Detection of Neurotoxic Effects of Medications through Automated Intensive Observation.","authors":"Serguei Pakhomov, Jacob Solinsky, Martin Michalowski, Veronika Bachanova","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a fully automated AI-based system for intensive monitoring of cognitive symptoms of neurotoxicity that frequently appear as a result of immunotherapy of hematologic malignancies. Early manifestations of these symptoms are evident in the patient's speech in the form of mild aphasia and confusion and can be detected and effectively treated prior to onset of more serious and potentially life-threatening impairment. We have developed the Automated Neural Nursing Assistant (ANNA) system designed to conduct a brief cognitive assessment several times per day over the telephone for 5-14 days following infusion of the immunotherapy medication. ANNA uses a conversational agent based on a large language model to elicit spontaneous speech in a semi-structured dialogue, followed by a series of brief language-based neurocognitive tests. In this paper we share ANNA's design and implementation, results of a pilot functional evaluation study, and discuss technical and logistic challenges facing the introduction of this type of technology in clinical practice. A large-scale clinical evaluation of ANNA will be conducted in an observational study of patients undergoing immunotherapy at the University of Minnesota Masonic Cancer Center starting in the Fall 2023.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"24-38"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creation of a Curated Database of Experimentally Determined Human Protein Structures for the Identification of Its Targetome. 创建实验确定的人类蛋白质结构编辑数据库,以确定其目标组。
Armand Ovanessians, Carson Snow, Thomas Jennewein, Susanta Sarkar, Gil Speyer, Judith Klein-Seetharaman

Assembling an "integrated structural map of the human cell" at atomic resolution will require a complete set of all human protein structures available for interaction with other biomolecules - the human protein structure targetome - and a pipeline of automated tools that allow quantitative analysis of millions of protein-ligand interactions. Toward this goal, we here describe the creation of a curated database of experimentally determined human protein structures. Starting with the sequences of 20,422 human proteins, we selected the most representative structure for each protein (if available) from the protein database (PDB), ranking structures by coverage of sequence by structure, depth (the difference between the final and initial residue number of each chain), resolution, and experimental method used to determine the structure. To enable expansion into an entire human targetome, we docked small molecule ligands to our curated set of protein structures. Using design constraints derived from comparing structure assembly and ligand docking results obtained with challenging protein examples, we here propose to combine this curated database of experimental structures with AlphaFold predictions and multi-domain assembly using DEMO2 in the future. To demonstrate the utility of our curated database in identification of the human protein structure targetome, we used docking with AutoDock Vina and created tools for automated analysis of affinity and binding site locations of the thousands of protein-ligand prediction results. The resulting human targetome, which can be updated and expanded with an evolving curated database and increasing numbers of ligands, is a valuable addition to the growing toolkit of structural bioinformatics.

要绘制原子分辨率的 "人类细胞综合结构图",需要一套完整的可与其他生物大分子相互作用的人类蛋白质结构--人类蛋白质结构目标组--以及一套可对数百万种蛋白质配体相互作用进行定量分析的自动化工具。为了实现这一目标,我们在此介绍了如何创建一个经实验确定的人类蛋白质结构数据库。从 20,422 个人类蛋白质的序列开始,我们从蛋白质数据库(PDB)中为每个蛋白质选择了最具代表性的结构(如果有的话),按照结构的序列覆盖率、深度(每条链的最终残基数与初始残基数之差)、分辨率以及确定结构所用的实验方法对结构进行排序。为了能够扩展到整个人类靶标组,我们将小分子配体与我们策划的蛋白质结构集对接。通过比较结构组装和配体对接结果与具有挑战性的蛋白质实例得出的设计约束,我们在此建议将来将这个实验结构策展数据库与 AlphaFold 预测和使用 DEMO2 的多域组装结合起来。为了证明我们所策划的数据库在识别人类蛋白质结构目标组方面的实用性,我们使用了 AutoDock Vina 进行对接,并创建了用于自动分析数千个蛋白质配体预测结果的亲和力和结合位点位置的工具。由此产生的人类靶标组可以随着不断发展的数据库和配体数量的增加而更新和扩展,是对结构生物信息学日益增长的工具包的宝贵补充。
{"title":"Creation of a Curated Database of Experimentally Determined Human Protein Structures for the Identification of Its Targetome.","authors":"Armand Ovanessians, Carson Snow, Thomas Jennewein, Susanta Sarkar, Gil Speyer, Judith Klein-Seetharaman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Assembling an \"integrated structural map of the human cell\" at atomic resolution will require a complete set of all human protein structures available for interaction with other biomolecules - the human protein structure targetome - and a pipeline of automated tools that allow quantitative analysis of millions of protein-ligand interactions. Toward this goal, we here describe the creation of a curated database of experimentally determined human protein structures. Starting with the sequences of 20,422 human proteins, we selected the most representative structure for each protein (if available) from the protein database (PDB), ranking structures by coverage of sequence by structure, depth (the difference between the final and initial residue number of each chain), resolution, and experimental method used to determine the structure. To enable expansion into an entire human targetome, we docked small molecule ligands to our curated set of protein structures. Using design constraints derived from comparing structure assembly and ligand docking results obtained with challenging protein examples, we here propose to combine this curated database of experimental structures with AlphaFold predictions and multi-domain assembly using DEMO2 in the future. To demonstrate the utility of our curated database in identification of the human protein structure targetome, we used docking with AutoDock Vina and created tools for automated analysis of affinity and binding site locations of the thousands of protein-ligand prediction results. The resulting human targetome, which can be updated and expanded with an evolving curated database and increasing numbers of ligands, is a valuable addition to the growing toolkit of structural bioinformatics.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"291-305"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Spatial Transcriptomics Analysis by Integrating Image-Aware Deep Learning Methods. 通过整合图像感知深度学习方法加强空间转录组学分析
Jiarong Song, Josh Lamstein, Vivek Gopal Ramaswamy, Michelle Webb, Gabriel Zada, Steven Finkbeiner, David W Craig

Spatial transcriptomics (ST) represents a pivotal advancement in biomedical research, enabling the transcriptional profiling of cells within their morphological context and providing a pivotal tool for understanding spatial heterogeneity in cancer tissues. However, current analytical approaches, akin to single-cell analysis, largely depend on gene expression, underutilizing the rich morphological information inherent in the tissue. We present a novel method integrating spatial transcriptomics and histopathological image data to better capture biologically meaningful patterns in patient data, focusing on aggressive cancer types such as glioblastoma and triple-negative breast cancer. We used a ResNet-based deep learning model to extract key morphological features from high-resolution whole-slide histology images. Spot-level PCA-reduced vectors of both the ResNet-50 analysis of the histological image and the spatial gene expression data were used in Louvain clustering to enable image-aware feature discovery. Assessment of features from image-aware clustering successfully pinpointed key biological features identified by manual histopathology, such as for regions of fibrosis and necrosis, as well as improved edge definition in EGFR-rich areas. Importantly, our combinatorial approach revealed crucial characteristics seen in histopathology that gene-expression-only analysis had missed.Supplemental Material: https://github.com/davcraig75/song_psb2014/blob/main/SupplementaryData.pdf.

空间转录组学(ST)是生物医学研究领域的一项重要进展,它能在细胞形态学背景下对细胞进行转录剖析,为了解癌症组织的空间异质性提供了重要工具。然而,目前类似单细胞分析的分析方法主要依赖于基因表达,对组织中固有的丰富形态学信息利用不足。我们提出了一种整合空间转录组学和组织病理学图像数据的新方法,以更好地捕捉患者数据中具有生物学意义的模式,重点关注侵袭性癌症类型,如胶质母细胞瘤和三阴性乳腺癌。我们使用基于 ResNet 的深度学习模型从高分辨率全切片组织学图像中提取关键形态学特征。对组织学图像的 ResNet-50 分析和空间基因表达数据的点级 PCA 还原向量被用于卢万聚类,以实现图像感知特征发现。通过图像感知聚类对特征进行评估,成功确定了人工组织病理学确定的关键生物学特征,如纤维化和坏死区域,以及表皮生长因子受体富集区域的边缘定义。重要的是,我们的组合方法揭示了组织病理学中的关键特征,而仅有基因表达的分析却忽略了这些特征。补充材料:https://github.com/davcraig75/song_psb2014/blob/main/SupplementaryData.pdf。
{"title":"Enhancing Spatial Transcriptomics Analysis by Integrating Image-Aware Deep Learning Methods.","authors":"Jiarong Song, Josh Lamstein, Vivek Gopal Ramaswamy, Michelle Webb, Gabriel Zada, Steven Finkbeiner, David W Craig","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) represents a pivotal advancement in biomedical research, enabling the transcriptional profiling of cells within their morphological context and providing a pivotal tool for understanding spatial heterogeneity in cancer tissues. However, current analytical approaches, akin to single-cell analysis, largely depend on gene expression, underutilizing the rich morphological information inherent in the tissue. We present a novel method integrating spatial transcriptomics and histopathological image data to better capture biologically meaningful patterns in patient data, focusing on aggressive cancer types such as glioblastoma and triple-negative breast cancer. We used a ResNet-based deep learning model to extract key morphological features from high-resolution whole-slide histology images. Spot-level PCA-reduced vectors of both the ResNet-50 analysis of the histological image and the spatial gene expression data were used in Louvain clustering to enable image-aware feature discovery. Assessment of features from image-aware clustering successfully pinpointed key biological features identified by manual histopathology, such as for regions of fibrosis and necrosis, as well as improved edge definition in EGFR-rich areas. Importantly, our combinatorial approach revealed crucial characteristics seen in histopathology that gene-expression-only analysis had missed.Supplemental Material: https://github.com/davcraig75/song_psb2014/blob/main/SupplementaryData.pdf.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"450-463"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine. 心脏代谢特征的多基因风险评分显示了祖先对于预测性精准医疗的重要性。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0046
R. Kember, S. Verma, A. Verma, B. Xiao, Anastasia Lucas, Colleen M Kripke, R. Judy, Jinbo Chen, S. Damrauer, D. J. Rader, Marylyn D. Ritchie
Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.
多基因风险评分(PRS)主要来自对欧洲血统(EUR)个体进行的全基因组关联研究(GWAS)。在本研究中,我们对基于宾夕法尼亚医学生物库(PMBB)中五种心脏代谢表型的多血统 GWAS 的多基因风险评分进行了深入评估,随后又进行了全表型关联研究(PheWAS)。我们研究了所有个体的 PRS 性能,并分别研究了非洲血统 (AFR) 和欧洲血统群体的 PRS 性能。对于非洲裔个体,使用多血统 LD 面板得出的 PRS 在五个 PRS 中的四个(DBP、SBP、T2D 和 BMI)显示出比从非洲裔 LD 面板得出的 PRS 更高的效应大小。相比之下,对于欧洲人,多家系 LD 面板 PRS 在五个 PRS 中的两个(SBP 和 T2D)显示出比欧洲 LD 面板更高的效应大小。这些发现凸显了在不同遗传背景下利用多家系家系 LD 面板推导 PRS 的潜在益处,并证明了在所有个体中的整体稳健性。我们的研究结果还揭示了 PRS 与各种表型类别之间的重要关联。例如,CAD PRS 与 18 种表型(AFR)和 82 种表型(EUR)相关,而 T2D PRS 与 84 种表型(AFR)和 78 种表型(EUR)相关。值得注意的是,高脂血症、肾功能衰竭、心房颤动、冠状动脉粥样硬化、肥胖和高血压等症状在非洲裔美国人和欧洲裔美国人群体中的不同PRS中都存在关联,其效应大小和显著性水平各不相同。然而,与欧洲人相比,非洲裔美国人的 PRS 与其他表型相关的强度和数量普遍降低。我们的研究强调,未来的研究需要优先考虑:1)在不同的祖先群体中开展 GWAS;2)创建一种普遍适用于所有遗传背景的世界性 PRS 方法。这些进展将促进更公平、更个性化的精准医疗方法。
{"title":"Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine.","authors":"R. Kember, S. Verma, A. Verma, B. Xiao, Anastasia Lucas, Colleen M Kripke, R. Judy, Jinbo Chen, S. Damrauer, D. J. Rader, Marylyn D. Ritchie","doi":"10.1142/9789811286421_0046","DOIUrl":"https://doi.org/10.1142/9789811286421_0046","url":null,"abstract":"Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"565 ","pages":"611-626"},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VetLLM: Large Language Model for Predicting Diagnosis from Veterinary Notes. VetLLM:从兽医笔记中预测诊断的大型语言模型。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0010
Yixing Jiang, Jeremy Irvin, Andrew Y. Ng, James Zou
Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.
缺乏诊断编码是利用兽医笔记进行医学和公共卫生研究的障碍。以往的工作仅限于开发基于规则的专门模型或定制的监督学习模型来预测诊断编码,这既繁琐又不易移植。在这项工作中,我们展示了在通用语料库上预先训练的开源大语言模型(LLM)可以在零镜头设置中实现合理的性能。Alpaca-7B 在 CSU 测试数据和 PP 测试数据(兽医笔记编码的两个标准基准)上的零射频 F1 分别为 0.538 和 0.389。此外,通过适当的微调,LLM 的性能可以大幅提升,超过最先进的强监督模型。仅使用 5000 份兽医笔记在 Alpaca-7B 上进行微调的 VetLLM 在 CSU 测试数据上的 F1 值为 0.747,在 PP 测试数据上的 F1 值为 0.637。值得注意的是,我们的微调具有很高的数据效率:使用 200 份笔记的效果优于使用超过 100,000 份笔记训练的监督模型。研究结果表明,利用 LLMs 完成医学语言处理任务具有巨大的潜力,我们提倡将这种新模式用于处理临床文本。
{"title":"VetLLM: Large Language Model for Predicting Diagnosis from Veterinary Notes.","authors":"Yixing Jiang, Jeremy Irvin, Andrew Y. Ng, James Zou","doi":"10.1142/9789811286421_0010","DOIUrl":"https://doi.org/10.1142/9789811286421_0010","url":null,"abstract":"Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"551 ","pages":"120-133"},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combined kinome inhibition states are predictive of cancer cell line sensitivity to kinase inhibitor combination therapies. 联合激酶组抑制状态可预测癌细胞系对激酶抑制剂联合疗法的敏感性。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0022
Chinmaya U. Joisa, Kevin A Chen, Samantha Beville, T. Stuhlmiller, Matthew E. Berginski, Denis O Okumu, B. Golitz, M. East, Gary L Johnson, Shawn M Gomez
Protein kinases are a primary focus in targeted therapy development for cancer, owing to their role as regulators in nearly all areas of cell life. Recent strategies targeting the kinome with combination therapies have shown promise, such as trametinib and dabrafenib in advanced melanoma, but empirical design for less characterized pathways remains a challenge. Computational combination screening is an attractive alternative, allowing in-silico filtering prior to experimental testing of drastically fewer leads, increasing efficiency and effectiveness of drug development pipelines. In this work, we generated combined kinome inhibition states of 40,000 kinase inhibitor combinations from kinobeads-based kinome profiling across 64 doses. We then integrated these with transcriptomics from CCLE to build machine learning models with elastic-net feature selection to predict cell line sensitivity across nine cancer types, with accuracy R2 ∼ 0.75-0.9. We then validated the model by using a PDX-derived TNBC cell line and saw good global accuracy (R2 ∼ 0.7) as well as high accuracy in predicting synergy using four popular metrics (R2 ∼ 0.9). Additionally, the model was able to predict a highly synergistic combination of trametinib and omipalisib for TNBC treatment, which incidentally was recently in phase I clinical trials. Our choice of tree-based models for greater interpretability allowed interrogation of highly predictive kinases in each cancer type, such as the MAPK, CDK, and STK kinases. Overall, these results suggest that kinome inhibition states of kinase inhibitor combinations are strongly predictive of cell line responses and have great potential for integration into computational drug screening pipelines. This approach may facilitate the identification of effective kinase inhibitor combinations and accelerate the development of novel cancer therapies, ultimately improving patient outcomes.
蛋白激酶是癌症靶向疗法开发的主要焦点,因为它们在细胞生命的几乎所有领域都发挥着调节作用。最近以激酶组为靶点的联合疗法策略已初见成效,如用于晚期黑色素瘤的曲美替尼和达拉非尼,但针对特征较少的通路进行经验性设计仍是一项挑战。计算组合筛选是一种有吸引力的替代方法,它可以在实验测试之前对数量大幅减少的线索进行体内筛选,从而提高药物开发流水线的效率和有效性。在这项工作中,我们通过基于激酶标靶的激酶组图谱分析,生成了 40,000 种激酶抑制剂组合在 64 种剂量下的综合激酶组抑制状态。然后,我们将其与 CCLE 的转录组学整合,建立了具有弹性网特征选择的机器学习模型,以预测九种癌症类型的细胞系敏感性,准确率 R2 ∼ 0.75-0.9。然后,我们使用源自 TNBC 细胞系的 PDX 验证了该模型,结果显示该模型具有良好的全局准确性(R2 ∼ 0.7),而且使用四种常用指标预测协同作用的准确性也很高(R2 ∼ 0.9)。此外,该模型还能预测曲美替尼和奥米帕利在 TNBC 治疗中的高度协同组合,而这一组合最近刚刚进入 I 期临床试验。我们选择基于树状结构的模型以提高可解释性,这样就能对每种癌症类型中的高预测性激酶进行分析,如 MAPK、CDK 和 STK 激酶。总之,这些结果表明,激酶抑制剂组合的激酶组抑制状态对细胞系反应具有很强的预测性,并具有整合到计算药物筛选管道的巨大潜力。这种方法可以促进有效激酶抑制剂组合的鉴定,加快新型癌症疗法的开发,最终改善患者的预后。
{"title":"Combined kinome inhibition states are predictive of cancer cell line sensitivity to kinase inhibitor combination therapies.","authors":"Chinmaya U. Joisa, Kevin A Chen, Samantha Beville, T. Stuhlmiller, Matthew E. Berginski, Denis O Okumu, B. Golitz, M. East, Gary L Johnson, Shawn M Gomez","doi":"10.1142/9789811286421_0022","DOIUrl":"https://doi.org/10.1142/9789811286421_0022","url":null,"abstract":"Protein kinases are a primary focus in targeted therapy development for cancer, owing to their role as regulators in nearly all areas of cell life. Recent strategies targeting the kinome with combination therapies have shown promise, such as trametinib and dabrafenib in advanced melanoma, but empirical design for less characterized pathways remains a challenge. Computational combination screening is an attractive alternative, allowing in-silico filtering prior to experimental testing of drastically fewer leads, increasing efficiency and effectiveness of drug development pipelines. In this work, we generated combined kinome inhibition states of 40,000 kinase inhibitor combinations from kinobeads-based kinome profiling across 64 doses. We then integrated these with transcriptomics from CCLE to build machine learning models with elastic-net feature selection to predict cell line sensitivity across nine cancer types, with accuracy R2 ∼ 0.75-0.9. We then validated the model by using a PDX-derived TNBC cell line and saw good global accuracy (R2 ∼ 0.7) as well as high accuracy in predicting synergy using four popular metrics (R2 ∼ 0.9). Additionally, the model was able to predict a highly synergistic combination of trametinib and omipalisib for TNBC treatment, which incidentally was recently in phase I clinical trials. Our choice of tree-based models for greater interpretability allowed interrogation of highly predictive kinases in each cancer type, such as the MAPK, CDK, and STK kinases. Overall, these results suggest that kinome inhibition states of kinase inhibitor combinations are strongly predictive of cell line responses and have great potential for integration into computational drug screening pipelines. This approach may facilitate the identification of effective kinase inhibitor combinations and accelerate the development of novel cancer therapies, ultimately improving patient outcomes.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"46 21","pages":"276-290"},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis. FedBrain:基于连接体的脑成像分析的图神经网络联合训练。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0017
Yi Yang, Han Xie, Hejie Cui, †. CarlYang
Recent advancements in neuroimaging techniques have sparked a growing interest in understanding the complex interactions between anatomical regions of interest (ROIs), forming into brain networks that play a crucial role in various clinical tasks, such as neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have emerged as powerful tools for analyzing network data. However, due to the complexity of data acquisition and regulatory restrictions, brain network studies remain limited in scale and are often confined to local institutions. These limitations greatly challenge GNN models to capture useful neural circuitry patterns and deliver robust downstream performance. As a distributed machine learning paradigm, federated learning (FL) provides a promising solution in addressing resource limitation and privacy concerns, by enabling collaborative learning across local institutions (i.e., clients) without data sharing. While the data heterogeneity issues have been extensively studied in recent FL literature, cross-institutional brain network analysis presents unique data heterogeneity challenges, that is, the inconsistent ROI parcellation systems and varying predictive neural circuitry patterns across local neuroimaging studies. To this end, we propose FedBrain, a GNN-based personalized FL framework that takes into account the unique properties of brain network data. Specifically, we present a federated atlas mapping mechanism to overcome the feature and structure heterogeneity of brain networks arising from different ROI atlas systems, and a clustering approach guided by clinical prior knowledge to address varying predictive neural circuitry patterns regarding different patient groups, neuroimaging modalities and clinical outcomes. Compared to existing FL strategies, our approach demonstrates superior and more consistent performance, showcasing its strong potential and generalizability in cross-institutional connectome-based brain imaging analysis. The implementation is available here.
神经成像技术的最新进展引发了人们对了解解剖学感兴趣区(ROIs)之间复杂相互作用的日益浓厚的兴趣,这些相互作用形成的大脑网络在神经模式发现和疾病诊断等各种临床任务中发挥着至关重要的作用。近年来,图神经网络(GNN)已成为分析网络数据的强大工具。然而,由于数据采集的复杂性和监管限制,脑网络研究的规模仍然有限,而且往往局限于本地机构。这些限制极大地挑战了 GNN 模型捕捉有用神经回路模式并提供稳健下游性能的能力。作为一种分布式机器学习范例,联合学习(FL)提供了一种很有前景的解决方案,它能在不共享数据的情况下,实现本地机构(即客户)之间的协作学习,从而解决资源限制和隐私问题。虽然数据异构问题已在最近的联合学习文献中得到了广泛研究,但跨机构脑网络分析面临着独特的数据异构挑战,即本地神经影像研究中不一致的 ROI 剖分系统和不同的预测神经回路模式。为此,我们提出了基于 GNN 的个性化 FL 框架 FedBrain,该框架考虑到了脑网络数据的独特属性。具体来说,我们提出了一种联合图集映射机制,以克服不同 ROI 图集系统产生的脑网络特征和结构异质性,并提出了一种以临床先验知识为指导的聚类方法,以解决不同患者群体、神经成像模式和临床结果的不同预测神经回路模式。与现有的 FL 策略相比,我们的方法表现出更优越、更稳定的性能,展示了其在跨机构基于连接体的脑成像分析中的强大潜力和通用性。具体实施请点击此处。
{"title":"FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis.","authors":"Yi Yang, Han Xie, Hejie Cui, †. CarlYang","doi":"10.1142/9789811286421_0017","DOIUrl":"https://doi.org/10.1142/9789811286421_0017","url":null,"abstract":"Recent advancements in neuroimaging techniques have sparked a growing interest in understanding the complex interactions between anatomical regions of interest (ROIs), forming into brain networks that play a crucial role in various clinical tasks, such as neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have emerged as powerful tools for analyzing network data. However, due to the complexity of data acquisition and regulatory restrictions, brain network studies remain limited in scale and are often confined to local institutions. These limitations greatly challenge GNN models to capture useful neural circuitry patterns and deliver robust downstream performance. As a distributed machine learning paradigm, federated learning (FL) provides a promising solution in addressing resource limitation and privacy concerns, by enabling collaborative learning across local institutions (i.e., clients) without data sharing. While the data heterogeneity issues have been extensively studied in recent FL literature, cross-institutional brain network analysis presents unique data heterogeneity challenges, that is, the inconsistent ROI parcellation systems and varying predictive neural circuitry patterns across local neuroimaging studies. To this end, we propose FedBrain, a GNN-based personalized FL framework that takes into account the unique properties of brain network data. Specifically, we present a federated atlas mapping mechanism to overcome the feature and structure heterogeneity of brain networks arising from different ROI atlas systems, and a clustering approach guided by clinical prior knowledge to address varying predictive neural circuitry patterns regarding different patient groups, neuroimaging modalities and clinical outcomes. Compared to existing FL strategies, our approach demonstrates superior and more consistent performance, showcasing its strong potential and generalizability in cross-institutional connectome-based brain imaging analysis. The implementation is available here.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"370 ","pages":"214-225"},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APPLICATION OF QUANTILE DISCRETIZATION AND BAYESIAN NETWORK ANALYSIS TO PUBLICLY AVAILABLE CYSTIC FIBROSIS DATA SETS. 将量化离散化和贝叶斯网络分析应用于公开的囊性纤维化数据集。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0041
Kiyoshi Ferreira Fukutani, Thomas H. Hampton, Carly A. Bobak, Todd A. MacKenzie, Bruce A. Stanton
The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.
研究同一现象的多个公开数据集的可用性有望加速科学发现。荟萃分析可以解决可重复性问题,通常还能提高研究效率。荟萃分析的前景对于囊性纤维化(CF)等罕见疾病尤为重要,全世界约有 10 万人患有囊性纤维化。最近对美国国立卫生研究院基因表达总库的搜索显示,与癌症有关的数据集有130万个,而与囊性纤维化有关的数据集只有约2000个。这些研究非常多样化,涉及不同的组织、动物模型、治疗方法和临床协变量。在搜索原代人类气道上皮细胞的基因表达研究时,我们发现了三项方法兼容、元数据充分的研究:GSE139078、Sala Study 和 PRJEB9292。尽管如此,实验设计并不完全相同,而且我们还发现了显著的批次效应,这将使功能分析变得更加复杂。在这里,我们介绍了使用希尔爬坡法进行量化离散化和贝叶斯网络构建的方法,它是克服实验差异并揭示 CF 基因型本身、暴露于病毒、细菌和用于治疗 CF 的药物的生物相关反应的有力工具。集群剖析器揭示的功能模式包括干扰素信号传导、γ干扰素信号传导、白细胞介素4和13信号传导、白细胞介素6信号传导、白细胞介素21信号传导,以及CSF3/G-CSF信号传导通路的失活,显示出显著的变化。与非CF细胞相比,这些通路始终与CF上皮细胞中较高的基因表达相关,这表明以这些通路为靶点可改善临床疗效。量子离散化和贝叶斯网络分析在CF方面的成功表明,这些方法可能适用于其他难以找到完全可比数据集的情况。
{"title":"APPLICATION OF QUANTILE DISCRETIZATION AND BAYESIAN NETWORK ANALYSIS TO PUBLICLY AVAILABLE CYSTIC FIBROSIS DATA SETS.","authors":"Kiyoshi Ferreira Fukutani, Thomas H. Hampton, Carly A. Bobak, Todd A. MacKenzie, Bruce A. Stanton","doi":"10.1142/9789811286421_0041","DOIUrl":"https://doi.org/10.1142/9789811286421_0041","url":null,"abstract":"The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"161 ","pages":"534-548"},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Measurement Noise on Genetic Association Studies of Cardiac Function. 测量噪音对心功能遗传关联研究的影响。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0011
Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C. Cheng, Debiao Li, Shoa L. Clarke, David Ouyang
Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.
最近的研究有效地利用了成像的定量性状来提高全基因组关联研究(GWAS)的能力,从而进一步了解疾病生物学和各种性状。然而,值得注意的是,表型分析本身存在测量误差和噪声,可能会影响后续的遗传分析。这项研究以左心室射血分数(LVEF)为重点,研究表型测量的不精确性如何影响遗传研究。研究人员评估了几种获取 LVEF 的方法以及模拟测量噪音对后续遗传分析的影响。结果显示,只需引入 7.9% 的测量噪声,就能消除近四万人的 LVEF GWAS 中的所有遗传关联。此外,LVEF 平均绝对误差(MAE)每增加 1%,对 GWAS 功率的影响相当于队列样本量减少 10%。因此,提高表型分析的准确性对于最大限度地提高全基因组关联研究的效果至关重要。
{"title":"Impact of Measurement Noise on Genetic Association Studies of Cardiac Function.","authors":"Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C. Cheng, Debiao Li, Shoa L. Clarke, David Ouyang","doi":"10.1142/9789811286421_0011","DOIUrl":"https://doi.org/10.1142/9789811286421_0011","url":null,"abstract":"Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"45 46","pages":"134-147"},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1