首页 > 最新文献

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献

英文 中文
BrainSTEAM: A Practical Pipeline for Connectome-based fMRI Analysis towards Subject Classification. BrainSTEAM:基于连接组的 fMRI 分析的实用管道,实现受试者分类。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0005
Alexis Li, Yi Yang, Hejie Cui, Carl Yang
Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.
大脑功能网络代表了解剖学感兴趣区(ROIs)之间动态而复杂的相互作用,为神经模式发现和疾病诊断提供了重要的临床见解。近年来,图神经网络(GNN)在分析结构化网络数据方面取得了巨大的成功和成效。然而,由于数据获取的高复杂性,导致神经影像数据的训练资源有限,图神经网络和所有深度学习模型一样,都存在过度拟合的问题。此外,它们捕捉有用神经模式进行下游预测的能力也受到了不利影响。为了应对这一挑战,本研究提出了 BrainSTEAM,这是一个具有时空模块的集成框架,由 EdgeConv GNN 模型、自动编码器网络和混合策略组成。其中,时空模块旨在将每个受试者 ROI 特征的时间序列信号动态分割成块序列。我们利用每个序列构建相关网络,从而增加训练数据。此外,我们还使用 EdgeConv GNN 捕捉 ROI 连接结构,使用自动编码器进行数据去噪,并使用 mixup 通过线性数据增强来加强模型训练。我们在两个真实世界的神经成像数据集上对我们的框架进行了评估,一个是用于自闭症预测的 ABIDE 数据集,另一个是用于性别预测的 HCP 数据集。广泛的实验证明了 BrainSTEAM 与各种现有模型相比的优越性和鲁棒性,展示了我们提出的机制在推广到其他基于连接体的 fMRI 分析研究中的强大潜力。
{"title":"BrainSTEAM: A Practical Pipeline for Connectome-based fMRI Analysis towards Subject Classification.","authors":"Alexis Li, Yi Yang, Hejie Cui, Carl Yang","doi":"10.1142/9789811286421_0005","DOIUrl":"https://doi.org/10.1142/9789811286421_0005","url":null,"abstract":"Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creation of a Curated Database of Experimentally Determined Human Protein Structures for the Identification of Its Targetome. 创建实验确定的人类蛋白质结构编辑数据库,以确定其目标组。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0023
Armand Ovanessians, Carson Snow, Thomas Jennewein, Susanta Sarkar, Gil Speyer, Judith Klein-Seetharaman
Assembling an "integrated structural map of the human cell" at atomic resolution will require a complete set of all human protein structures available for interaction with other biomolecules - the human protein structure targetome - and a pipeline of automated tools that allow quantitative analysis of millions of protein-ligand interactions. Toward this goal, we here describe the creation of a curated database of experimentally determined human protein structures. Starting with the sequences of 20,422 human proteins, we selected the most representative structure for each protein (if available) from the protein database (PDB), ranking structures by coverage of sequence by structure, depth (the difference between the final and initial residue number of each chain), resolution, and experimental method used to determine the structure. To enable expansion into an entire human targetome, we docked small molecule ligands to our curated set of protein structures. Using design constraints derived from comparing structure assembly and ligand docking results obtained with challenging protein examples, we here propose to combine this curated database of experimental structures with AlphaFold predictions and multi-domain assembly using DEMO2 in the future. To demonstrate the utility of our curated database in identification of the human protein structure targetome, we used docking with AutoDock Vina and created tools for automated analysis of affinity and binding site locations of the thousands of protein-ligand prediction results. The resulting human targetome, which can be updated and expanded with an evolving curated database and increasing numbers of ligands, is a valuable addition to the growing toolkit of structural bioinformatics.
要绘制原子分辨率的 "人类细胞综合结构图",需要一套完整的可与其他生物大分子相互作用的人类蛋白质结构--人类蛋白质结构目标组--以及一套可对数百万种蛋白质配体相互作用进行定量分析的自动化工具。为了实现这一目标,我们在此介绍了如何创建一个经实验确定的人类蛋白质结构数据库。从 20,422 个人类蛋白质的序列开始,我们从蛋白质数据库(PDB)中为每个蛋白质选择了最具代表性的结构(如果有的话),按照结构的序列覆盖率、深度(每条链的最终残基数与初始残基数之差)、分辨率以及确定结构所用的实验方法对结构进行排序。为了将研究扩展到整个人类靶标组,我们将小分子配体与我们整理的蛋白质结构集对接。通过比较结构组装和配体对接结果与具有挑战性的蛋白质示例得出的设计约束,我们在此建议将来使用 DEMO2 将该实验结构策展数据库与 AlphaFold 预测和多域组装结合起来。为了证明我们所策划的数据库在识别人类蛋白质结构目标组方面的实用性,我们使用了 AutoDock Vina 进行对接,并创建了用于自动分析数千个蛋白质配体预测结果的亲和力和结合位点位置的工具。由此产生的人类靶标组可以随着不断发展的数据库和配体数量的增加而更新和扩展,是对结构生物信息学日益增长的工具包的宝贵补充。
{"title":"Creation of a Curated Database of Experimentally Determined Human Protein Structures for the Identification of Its Targetome.","authors":"Armand Ovanessians, Carson Snow, Thomas Jennewein, Susanta Sarkar, Gil Speyer, Judith Klein-Seetharaman","doi":"10.1142/9789811286421_0023","DOIUrl":"https://doi.org/10.1142/9789811286421_0023","url":null,"abstract":"Assembling an \"integrated structural map of the human cell\" at atomic resolution will require a complete set of all human protein structures available for interaction with other biomolecules - the human protein structure targetome - and a pipeline of automated tools that allow quantitative analysis of millions of protein-ligand interactions. Toward this goal, we here describe the creation of a curated database of experimentally determined human protein structures. Starting with the sequences of 20,422 human proteins, we selected the most representative structure for each protein (if available) from the protein database (PDB), ranking structures by coverage of sequence by structure, depth (the difference between the final and initial residue number of each chain), resolution, and experimental method used to determine the structure. To enable expansion into an entire human targetome, we docked small molecule ligands to our curated set of protein structures. Using design constraints derived from comparing structure assembly and ligand docking results obtained with challenging protein examples, we here propose to combine this curated database of experimental structures with AlphaFold predictions and multi-domain assembly using DEMO2 in the future. To demonstrate the utility of our curated database in identification of the human protein structure targetome, we used docking with AutoDock Vina and created tools for automated analysis of affinity and binding site locations of the thousands of protein-ligand prediction results. The resulting human targetome, which can be updated and expanded with an evolving curated database and increasing numbers of ligands, is a valuable addition to the growing toolkit of structural bioinformatics.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session Introduction: Digital health technology data in biocomputing: Research efforts and considerations for expanding access (PSB2024). 会议简介:生物计算中的数字健康技术数据:研究工作和扩大访问的考虑因素(PSB2024)。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0013
Michelle Holko, Chris Lunt, Jessilyn P Dunn
Data from digital health technologies (DHT), including wearable sensors like Apple Watch, Whoop, Oura Ring, and Fitbit, are increasingly being used in biomedical research. Research and development of DHT-related devices, platforms, and applications is happening rapidly and with significant private-sector involvement with new biotech companies and large tech companies (e.g. Google, Apple, Amazon, Uber) investing heavily in technologies to improve human health. Many academic institutions are building capabilities related to DHT research, often in cross-sector collaboration with technology companies and other organizations with the goal of generating clinically meaningful evidence to improve patient care, to identify users at an earlier stage of disease presentation, and to support health preservation and disease prevention. Large research consortia, cross-sector partnerships, and individual research labs are all represented in the current corpus of published studies. Some of the large research studies, like NIH's All of Us Research Program, make data sets from wearable sensors available to the research community, while the vast majority of data from wearable sensors and other DHTs are held by private sector organizations and are not readily available to the research community. As data are unlocked from the private sector and made available to the academic research community, there is an opportunity to develop innovative analytics and methods through expanded access. This is the second year for this Session which solicited research results leveraging digital health technologies, including wearable sensor data, describing novel analytical methods, and issues related to diversity, equity, inclusion (DEI) of the research, data, and the community of researchers working in this area. We particularly encouraged submissions describing opportunities for expanding and democratizing academic research using data from wearable sensors and related digital health technologies.
来自数字健康技术(DHT)的数据,包括 Apple Watch、Whoop、Oura Ring 和 Fitbit 等可穿戴传感器的数据,正越来越多地被用于生物医学研究。与数字健康技术相关的设备、平台和应用的研究与开发正在快速进行,新兴生物技术公司和大型科技公司(如谷歌、苹果、亚马逊、优步等)大量投资于改善人类健康的技术,私营部门也积极参与其中。许多学术机构正在建设与 DHT 研究相关的能力,通常是与技术公司和其他组织开展跨部门合作,目标是提供有临床意义的证据,以改善患者护理,在疾病的早期阶段识别用户,并支持健康保护和疾病预防。在目前已发表的研究成果中,大型研究联盟、跨部门合作以及单个研究实验室均有体现。一些大型研究,如美国国立卫生研究院的 "我们所有人研究计划",向研究界提供了来自可穿戴传感器的数据集,而来自可穿戴传感器和其他 DHT 的绝大多数数据都由私营部门组织掌握,不能随时向研究界提供。随着数据从私营部门解锁并提供给学术研究界,有机会通过扩大访问范围来开发创新的分析方法和手段。今年是该会议举办的第二年,会议征集了利用数字健康技术(包括可穿戴传感器数据)的研究成果,介绍了新颖的分析方法,以及与该领域的研究、数据和研究人员群体的多样性、公平性和包容性(DEI)相关的问题。我们特别鼓励在提交的论文中描述利用可穿戴传感器和相关数字健康技术的数据扩大学术研究并使之民主化的机会。
{"title":"Session Introduction: Digital health technology data in biocomputing: Research efforts and considerations for expanding access (PSB2024).","authors":"Michelle Holko, Chris Lunt, Jessilyn P Dunn","doi":"10.1142/9789811286421_0013","DOIUrl":"https://doi.org/10.1142/9789811286421_0013","url":null,"abstract":"Data from digital health technologies (DHT), including wearable sensors like Apple Watch, Whoop, Oura Ring, and Fitbit, are increasingly being used in biomedical research. Research and development of DHT-related devices, platforms, and applications is happening rapidly and with significant private-sector involvement with new biotech companies and large tech companies (e.g. Google, Apple, Amazon, Uber) investing heavily in technologies to improve human health. Many academic institutions are building capabilities related to DHT research, often in cross-sector collaboration with technology companies and other organizations with the goal of generating clinically meaningful evidence to improve patient care, to identify users at an earlier stage of disease presentation, and to support health preservation and disease prevention. Large research consortia, cross-sector partnerships, and individual research labs are all represented in the current corpus of published studies. Some of the large research studies, like NIH's All of Us Research Program, make data sets from wearable sensors available to the research community, while the vast majority of data from wearable sensors and other DHTs are held by private sector organizations and are not readily available to the research community. As data are unlocked from the private sector and made available to the academic research community, there is an opportunity to develop innovative analytics and methods through expanded access. This is the second year for this Session which solicited research results leveraging digital health technologies, including wearable sensor data, describing novel analytical methods, and issues related to diversity, equity, inclusion (DEI) of the research, data, and the community of researchers working in this area. We particularly encouraged submissions describing opportunities for expanding and democratizing academic research using data from wearable sensors and related digital health technologies.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tools for assembling the cell: Towards the era of cell structural bioinformatics. 组装细胞的工具:迈向细胞结构生物信息学时代。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0052
Mengzhou Hu, Xikun Zhang, Andrew Latham, Andrej Šali, T. Ideker, Emma Lundberg
Cells consist of large components, such as organelles, that recursively factor into smaller systems, such as condensates and protein complexes, forming a dynamic multi-scale structure of the cell. Recent technological innovations have paved the way for systematic interrogation of subcellular structures, yielding unprecedented insights into their roles and interactions. In this workshop, we discuss progress, challenges, and collaboration to marshal various computational approaches toward assembling an integrated structural map of the human cell.
细胞由细胞器等大型组件组成,这些组件递归为更小的系统,如凝聚体和蛋白质复合物,形成了细胞的动态多尺度结构。最近的技术创新为系统分析亚细胞结构铺平了道路,对它们的作用和相互作用产生了前所未有的洞察力。在本次研讨会上,我们将讨论各种计算方法的进展、挑战与合作,以构建人类细胞的综合结构图。
{"title":"Tools for assembling the cell: Towards the era of cell structural bioinformatics.","authors":"Mengzhou Hu, Xikun Zhang, Andrew Latham, Andrej Šali, T. Ideker, Emma Lundberg","doi":"10.1142/9789811286421_0052","DOIUrl":"https://doi.org/10.1142/9789811286421_0052","url":null,"abstract":"Cells consist of large components, such as organelles, that recursively factor into smaller systems, such as condensates and protein complexes, forming a dynamic multi-scale structure of the cell. Recent technological innovations have paved the way for systematic interrogation of subcellular structures, yielding unprecedented insights into their roles and interactions. In this workshop, we discuss progress, challenges, and collaboration to marshal various computational approaches toward assembling an integrated structural map of the human cell.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Computer-Aided Diagnosis with Cost-Aware Deep Learning Models. 利用成本意识深度学习模型优化计算机辅助诊断。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0009
Charmi Patel, Yiyang Wang, Thiruvarangan Ramaraj, Roselyne B. Tchoua, Jacob Furst, D. Raicu
Classical machine learning and deep learning models for Computer-Aided Diagnosis (CAD) commonly focus on overall classification performance, treating misclassification errors (false negatives and false positives) equally during training. This uniform treatment overlooks the distinct costs associated with each type of error, leading to suboptimal decision-making, particularly in the medical domain where it is important to improve the prediction sensitivity without significantly compromising overall accuracy. This study introduces a novel deep learning-based CAD system that incorporates a cost-sensitive parameter into the activation function. By applying our methodologies to two medical imaging datasets, our proposed study shows statistically significant increases of 3.84% and 5.4% in sensitivity while maintaining overall accuracy for Lung Image Database Consortium (LIDC) and Breast Cancer Histological Database (BreakHis), respectively. Our findings underscore the significance of integrating cost-sensitive parameters into future CAD systems to optimize performance and ultimately reduce costs and improve patient outcomes.
用于计算机辅助诊断(CAD)的经典机器学习和深度学习模型通常侧重于整体分类性能,在训练过程中同等对待误分类错误(假阴性和假阳性)。这种统一的处理方式忽略了与每种错误相关的不同成本,导致了决策的次优化,尤其是在医疗领域,提高预测灵敏度而不严重影响整体准确性非常重要。本研究介绍了一种基于深度学习的新型 CAD 系统,该系统在激活函数中加入了成本敏感参数。通过将我们的方法应用于两个医学影像数据集,我们提出的研究表明,在保持肺图像数据库联盟(LIDC)和乳腺癌组织学数据库(BreakHis)总体准确性的同时,灵敏度在统计学上分别显著提高了 3.84% 和 5.4%。我们的研究结果强调了将对成本敏感的参数整合到未来 CAD 系统中的重要性,以优化性能并最终降低成本和改善患者预后。
{"title":"Optimizing Computer-Aided Diagnosis with Cost-Aware Deep Learning Models.","authors":"Charmi Patel, Yiyang Wang, Thiruvarangan Ramaraj, Roselyne B. Tchoua, Jacob Furst, D. Raicu","doi":"10.1142/9789811286421_0009","DOIUrl":"https://doi.org/10.1142/9789811286421_0009","url":null,"abstract":"Classical machine learning and deep learning models for Computer-Aided Diagnosis (CAD) commonly focus on overall classification performance, treating misclassification errors (false negatives and false positives) equally during training. This uniform treatment overlooks the distinct costs associated with each type of error, leading to suboptimal decision-making, particularly in the medical domain where it is important to improve the prediction sensitivity without significantly compromising overall accuracy. This study introduces a novel deep learning-based CAD system that incorporates a cost-sensitive parameter into the activation function. By applying our methodologies to two medical imaging datasets, our proposed study shows statistically significant increases of 3.84% and 5.4% in sensitivity while maintaining overall accuracy for Lung Image Database Consortium (LIDC) and Breast Cancer Histological Database (BreakHis), respectively. Our findings underscore the significance of integrating cost-sensitive parameters into future CAD systems to optimize performance and ultimately reduce costs and improve patient outcomes.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imputation of race and ethnicity categories using genetic ancestry from real-world genomic testing data. 利用真实世界基因组测试数据中的遗传祖先推算种族和人种类别。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0033
Brooke Rhead, Paige E. Haffener, Y. Pouliot, Francisco M. De La Vega
The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods-one heuristic and the other machine learning-based-to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.859-0.993) and precision (range: 0.932-0.981) across four mutually exclusive race and ethnicity categories. This work presents a novel pathway to enhance RWD utility in studying racial disparities in healthcare.
真实世界数据(RWD)中种族和民族信息的不完整性阻碍了其在促进医疗公平方面的作用。本研究介绍了两种方法--一种是启发式方法,另一种是基于机器学习的方法--利用肿瘤图谱数据从遗传祖先推算种族和人种。通过分析用 Tempus xT 面板测序的 10 万多名癌症患者的去标识化数据,我们证明这两种方法都优于现有的基于地理位置和姓氏的方法,其中机器学习方法在四个相互排斥的种族和民族类别中实现了高召回率(范围:0.859-0.993)和高精确度(范围:0.932-0.981)。这项工作提出了一种新的途径,以提高 RWD 在研究医疗保健中种族差异方面的效用。
{"title":"Imputation of race and ethnicity categories using genetic ancestry from real-world genomic testing data.","authors":"Brooke Rhead, Paige E. Haffener, Y. Pouliot, Francisco M. De La Vega","doi":"10.1142/9789811286421_0033","DOIUrl":"https://doi.org/10.1142/9789811286421_0033","url":null,"abstract":"The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods-one heuristic and the other machine learning-based-to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.859-0.993) and precision (range: 0.932-0.981) across four mutually exclusive race and ethnicity categories. This work presents a novel pathway to enhance RWD utility in studying racial disparities in healthcare.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic Estimation of Treatment Effect on Hospitalization Risk as a Drug Repurposing Screening Method. 系统估算治疗效果对住院风险的影响,作为药物再利用筛选方法。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0019
Costa Georgantas, Jaume Banus, Roger Hullin, Jonas Richiardi
Drug repurposing (DR) intends to identify new uses for approved medications outside their original indication. Computational methods for finding DR candidates usually rely on prior biological and chemical information on a specific drug or target but rarely utilize real-world observations. In this work, we propose a simple and effective systematic screening approach to measure medication impact on hospitalization risk based on large-scale observational data. We use common classification systems to group drugs and diseases into broader functional categories and test for non-zero effects in each drug-disease category pair. Treatment effects on the hospitalization risk of an individual disease are obtained by combining widely used methods for causal inference and time-to-event modelling. 6468 drug-disease pairs were tested using data from the UK Biobank, focusing on cardiovascular, metabolic, and respiratory diseases. We determined key parameters to reduce the number of spurious correlations and identified 7 statistically significant associations of reduced hospitalization risk after correcting for multiple testing. Some of these associations were already reported in other studies, including new potential applications for cardioselective beta-blockers and thiazides. We also found evidence for proton pump inhibitor side effects and multiple possible associations for anti-diabetic drugs. Our work demonstrates the applicability of the present screening approach and the utility of real-world data for identifying potential DR candidates.
药物再利用(DR)旨在为已批准的药物确定其原始适应症之外的新用途。寻找 DR 候选药物的计算方法通常依赖于特定药物或靶点的先前生物和化学信息,但很少利用真实世界的观察结果。在这项工作中,我们提出了一种简单有效的系统筛选方法,基于大规模观察数据来衡量药物对住院风险的影响。我们使用常见的分类系统将药物和疾病归入更广泛的功能类别,并检验每个药物-疾病类别对的非零效应。通过结合广泛使用的因果推断和时间到事件建模方法,得出治疗对单个疾病住院风险的影响。我们利用英国生物库的数据对 6468 对药物-疾病配对进行了测试,重点关注心血管、代谢和呼吸系统疾病。我们确定了减少虚假相关性的关键参数,并在校正多重检验后确定了 7 种具有统计学意义的降低住院风险的相关性。其中一些关联在其他研究中已有报道,包括心脏选择性β受体阻滞剂和噻嗪类药物的新潜在应用。我们还发现了质子泵抑制剂副作用的证据以及抗糖尿病药物的多种可能关联。我们的工作证明了目前筛选方法的适用性以及真实世界数据在确定潜在 DR 候选药物方面的实用性。
{"title":"Systematic Estimation of Treatment Effect on Hospitalization Risk as a Drug Repurposing Screening Method.","authors":"Costa Georgantas, Jaume Banus, Roger Hullin, Jonas Richiardi","doi":"10.1142/9789811286421_0019","DOIUrl":"https://doi.org/10.1142/9789811286421_0019","url":null,"abstract":"Drug repurposing (DR) intends to identify new uses for approved medications outside their original indication. Computational methods for finding DR candidates usually rely on prior biological and chemical information on a specific drug or target but rarely utilize real-world observations. In this work, we propose a simple and effective systematic screening approach to measure medication impact on hospitalization risk based on large-scale observational data. We use common classification systems to group drugs and diseases into broader functional categories and test for non-zero effects in each drug-disease category pair. Treatment effects on the hospitalization risk of an individual disease are obtained by combining widely used methods for causal inference and time-to-event modelling. 6468 drug-disease pairs were tested using data from the UK Biobank, focusing on cardiovascular, metabolic, and respiratory diseases. We determined key parameters to reduce the number of spurious correlations and identified 7 statistically significant associations of reduced hospitalization risk after correcting for multiple testing. Some of these associations were already reported in other studies, including new potential applications for cardioselective beta-blockers and thiazides. We also found evidence for proton pump inhibitor side effects and multiple possible associations for anti-diabetic drugs. Our work demonstrates the applicability of the present screening approach and the utility of real-world data for identifying potential DR candidates.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EVALUATING THE RELATIONSHIPS BETWEEN GENETIC ANCESTRY AND THE CLINICAL PHENOME. 评估遗传血统与临床表型之间的关系。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0030
Jacqueline A. Piekos, Jeewoo Kim, Jacob M. Keaton, J. Hellwege, Todd L. Edwards, D. V. Velez Edwards
There is a desire in research to move away from the concept of race as a clinical factor because it is a societal construct used as an imprecise proxy for geographic ancestry. In this study, we leverage the biobank from Vanderbilt University Medical Center, BioVU, to investigate relationships between genetic ancestry proportion and the clinical phenome. For all samples in BioVU, we calculated six ancestry proportions based on 1000 Genomes references: eastern African (EAFR), western African (WAFR), northern European (NEUR), southern European (SEUR), eastern Asian (EAS), and southern Asian (SAS). From PheWAS, we found phecode categories significantly enriched neoplasms for EAFR, WAFR, and SEUR, and pregnancy complication in SEUR, NEUR, SAS, and EAS (p < 0.003). We then selected phenotypes hypertension (HTN) and atrial fibrillation (AFib) to further investigate the relationships between these phenotypes and EAFR, WAFR, SEUR, and NEUR using logistic regression modeling and non-linear restricted cubic spline modeling (RCS). For EAS and SAS, we chose renal failure (RF) for further modeling. The relationships between HTN and AFib and the ancestries EAFR, WAFR, and SEUR were best fit by the linear model (beta p < 1x10-4 for all) while the relationships with NEUR were best fit with RCS (HTN ANOVA p = 0.001, AFib ANOVA p < 1x10-4). For RF, the relationship with SAS was best fit with a linear model (beta p < 1x10-4) while RCS model was a better fit for EAS (ANOVA p < 1x10-4). In this study, we identify relationships between genetic ancestry and phenotypes that are best fit with non-linear modeling techniques. The assumption of linearity for regression modeling is integral for proper fitting of a model and there is no knowing a priori to modeling if the relationship is truly linear.
在研究中,人们希望摒弃将种族作为临床因素的概念,因为种族是一种社会结构,被用作地理血统的不精确替代物。在本研究中,我们利用范德比尔特大学医学中心的生物库(BioVU)来研究遗传血统比例与临床表型之间的关系。对于 BioVU 的所有样本,我们根据《1000 基因组》参考文献计算了六种祖先比例:非洲东部(EAFR)、非洲西部(WAFR)、欧洲北部(NEUR)、欧洲南部(SEUR)、亚洲东部(EAS)和亚洲南部(SAS)。从 PheWAS 中,我们发现在 EAFR、WAFR 和 SEUR 中,phecode 类别显著富集肿瘤;在 SEUR、NEUR、SAS 和 EAS 中,显著富集妊娠并发症(p < 0.003)。然后,我们选择了表型高血压(HTN)和心房颤动(AFib),使用逻辑回归模型和非线性限制立方样条模型(RCS)进一步研究这些表型与 EAFR、WAFR、SEUR 和 NEUR 之间的关系。对于 EAS 和 SAS,我们选择肾衰竭(RF)进行进一步建模。线性模型最符合高血压和心房颤动与祖先 EAFR、WAFR 和 SEUR 之间的关系(所有模型的贝塔值 p < 1x10-4),而 RCS 最符合与 NEUR 之间的关系(高血压方差分析 p = 0.001,心房颤动方差分析 p < 1x10-4)。就 RF 而言,线性模型最符合与 SAS 的关系(β p < 1x10-4),而 RCS 模型更符合与 EAS 的关系(方差分析 p < 1x10-4)。在这项研究中,我们确定了非线性建模技术最适合的遗传血统与表型之间的关系。回归建模的线性假设是正确拟合模型不可或缺的条件,而且在建模之前无法知道两者之间是否真的存在线性关系。
{"title":"EVALUATING THE RELATIONSHIPS BETWEEN GENETIC ANCESTRY AND THE CLINICAL PHENOME.","authors":"Jacqueline A. Piekos, Jeewoo Kim, Jacob M. Keaton, J. Hellwege, Todd L. Edwards, D. V. Velez Edwards","doi":"10.1142/9789811286421_0030","DOIUrl":"https://doi.org/10.1142/9789811286421_0030","url":null,"abstract":"There is a desire in research to move away from the concept of race as a clinical factor because it is a societal construct used as an imprecise proxy for geographic ancestry. In this study, we leverage the biobank from Vanderbilt University Medical Center, BioVU, to investigate relationships between genetic ancestry proportion and the clinical phenome. For all samples in BioVU, we calculated six ancestry proportions based on 1000 Genomes references: eastern African (EAFR), western African (WAFR), northern European (NEUR), southern European (SEUR), eastern Asian (EAS), and southern Asian (SAS). From PheWAS, we found phecode categories significantly enriched neoplasms for EAFR, WAFR, and SEUR, and pregnancy complication in SEUR, NEUR, SAS, and EAS (p < 0.003). We then selected phenotypes hypertension (HTN) and atrial fibrillation (AFib) to further investigate the relationships between these phenotypes and EAFR, WAFR, SEUR, and NEUR using logistic regression modeling and non-linear restricted cubic spline modeling (RCS). For EAS and SAS, we chose renal failure (RF) for further modeling. The relationships between HTN and AFib and the ancestries EAFR, WAFR, and SEUR were best fit by the linear model (beta p < 1x10-4 for all) while the relationships with NEUR were best fit with RCS (HTN ANOVA p = 0.001, AFib ANOVA p < 1x10-4). For RF, the relationship with SAS was best fit with a linear model (beta p < 1x10-4) while RCS model was a better fit for EAS (ANOVA p < 1x10-4). In this study, we identify relationships between genetic ancestry and phenotypes that are best fit with non-linear modeling techniques. The assumption of linearity for regression modeling is integral for proper fitting of a model and there is no knowing a priori to modeling if the relationship is truly linear.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers. 数字生物标记物的主题协调:从语言标记改进对轻度认知障碍的检测。
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0015
Bao Hoang, Yijiang Pang, Hiroko H. Dodge, Jiayu Zhou
Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.
轻度认知障碍(MCI)是包括阿尔茨海默病(AD)在内的痴呆症的早期阶段,也是治疗干预和治疗的关键阶段。早期发现 MCI 可为早期干预提供机会,并极大地丰富临床试验的队列。血浆和脑脊液生物标记物中的成像和活体标记物具有很高的检测性能,但其高昂的成本和侵扰性要求有更实惠、更易获得的替代品。数字生物标志物,尤其是语言标志物的最新进展显示出巨大的潜力,这些标志物从语言和/或语音中提取出与 MCI 相关的变量,然后用于预测建模。语言标记建模的一大挑战来自于每个人说话方式的多变性。由于大量的数据收集工作,语言研究的队列规模通常较小,人与人之间的可变性使得语言标记很难推广到未见过的受试者。在本文中,我们提出了一种新颖的受试者协调工具,以解决不同受试者之间语言标记分布差异的问题,从而提高机器学习模型的泛化性能。我们的实证结果表明,基于我们协调过的特征建立的机器学习模型在未见数据上的预测性能有所提高。源代码和实验脚本见 https://github.com/illidanlab/subject_harmonization。
{"title":"Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers.","authors":"Bao Hoang, Yijiang Pang, Hiroko H. Dodge, Jiayu Zhou","doi":"10.1142/9789811286421_0015","DOIUrl":"https://doi.org/10.1142/9789811286421_0015","url":null,"abstract":"Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
intCC: An efficient weighted integrative consensus clustering of multimodal data intCC:多模态数据的高效加权综合共识聚类
Q2 Computer Science Pub Date : 2023-12-17 DOI: 10.1142/9789811286421_0047
Can Huang, Pei Fen Kuan
High throughput profiling of multiomics data provides a valuable resource to better understand the complex human disease such as cancer and to potentially uncover new subtypes. Integrative clustering has emerged as a powerful unsupervised learning framework for subtype discovery. In this paper, we propose an efficient weighted integrative clustering called intCC by combining ensemble method, consensus clustering and kernel learning integrative clustering. We illustrate that intCC can accurately uncover the latent cluster structures via extensive simulation studies and a case study on the TCGA pan cancer datasets. An R package intCC implementing our proposed method is available at https://github.com/candsj/intCC.
多组学数据的高通量剖析为更好地了解癌症等复杂的人类疾病提供了宝贵的资源,并有可能发现新的亚型。整合聚类已成为发现亚型的一个强大的无监督学习框架。在本文中,我们结合了集合方法、共识聚类和核学习整合聚类,提出了一种高效的加权整合聚类,称为 intCC。我们通过大量的模拟研究和对 TCGA 泛癌症数据集的案例研究,说明 intCC 可以准确地发现潜在的聚类结构。实现我们提出的方法的 R 软件包 intCC 可在 https://github.com/candsj/intCC 上获取。
{"title":"intCC: An efficient weighted integrative consensus clustering of multimodal data","authors":"Can Huang, Pei Fen Kuan","doi":"10.1142/9789811286421_0047","DOIUrl":"https://doi.org/10.1142/9789811286421_0047","url":null,"abstract":"High throughput profiling of multiomics data provides a valuable resource to better understand the complex human disease such as cancer and to potentially uncover new subtypes. Integrative clustering has emerged as a powerful unsupervised learning framework for subtype discovery. In this paper, we propose an efficient weighted integrative clustering called intCC by combining ensemble method, consensus clustering and kernel learning integrative clustering. We illustrate that intCC can accurately uncover the latent cluster structures via extensive simulation studies and a case study on the TCGA pan cancer datasets. An R package intCC implementing our proposed method is available at https://github.com/candsj/intCC.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1