Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献_第6页

Earth Friendly Computation: Applying Indigenous Data Lifecycles in Medical and Sovereign AI. 地球友好型计算：在医疗和主权人工智能中应用本地数据生命周期。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0053

Keolu Fox, Krystal Tsosie, Alex Ioannidis, Kaja Wasik, Alec Calac, Eric Dawson

The following sections are included: Overview, Background & key terms, Earth Friendly Computation 574: Indigenous Data Sovereignty, Circular Systems, and Solarpunk Solutions for a Sustainable Future, AI in Point-of-Care: A Sustainable Healthcare Revolution at the Edge, Conclusion: The Future of Earth Friendly Computation, Acknowledgments, References.

包括以下部分：概述，背景和关键术语，地球友好计算574：本地数据主权，循环系统和可持续未来的Solarpunk解决方案，护理点的人工智能：边缘的可持续医疗革命，结论：地球友好计算的未来，致谢，参考文献。

引用次数: 0

Enhancing Privacy-Preserving Cancer Classification with Convolutional Neural Networks. 利用卷积神经网络增强癌症隐私保护分类。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0040

Aurora A F Colombo, Luca Colombo, Alessandro Falcetta, Manuel Roveri

Precision medicine significantly enhances patients prognosis, offering personalized treatments. Particularly for metastatic cancer, incorporating primary tumor location into the diagnostic process greatly improves survival rates. However, traditional methods rely on human expertise, requiring substantial time and financial resources. To address this challenge, Machine Learning (ML) and Deep Learning (DL) have proven particularly effective. Yet, their application to medical data, especially genomic data, must consider and encompass privacy due to the highly sensitive nature of data. In this paper, we propose OGHE, a convolutional neural network-based approach for privacy-preserving cancer classification designed to exploit spatial patterns in genomic data, while maintaining confidentiality by means of Homomorphic Encryption (HE). This encryption scheme allows the processing directly on encrypted data, guaranteeing its confidentiality during the entire computation. The design of OGHE is specific for privacy-preserving applications, taking into account HE limitations from the outset, and introducing an efficient packing mechanism to minimize the computational overhead introduced by HE. Additionally, OGHE relies on a novel feature selection method, VarScout, designed to extract the most significant features through clustering and occurrence analysis, while preserving inherent spatial patterns. Coupled with VarScout, OGHE has been compared with existing privacy-preserving solutions for encrypted cancer classification on the iDash 2020 dataset, demonstrating their effectiveness in providing accurate privacy-preserving cancer classification, and reducing latency thanks to our packing mechanism. The code is released to the scientific community.

精准医疗显著提高患者预后，提供个性化治疗。特别是对于转移性癌症，将原发肿瘤位置纳入诊断过程大大提高了生存率。然而，传统的方法依赖于人的专业知识，需要大量的时间和财政资源。为了应对这一挑战，机器学习（ML）和深度学习（DL）已被证明特别有效。然而，由于数据的高度敏感性，它们在医疗数据，特别是基因组数据中的应用必须考虑并包含隐私。在本文中，我们提出了一种基于卷积神经网络的隐私保护癌症分类方法OGHE，该方法旨在利用基因组数据中的空间模式，同时通过同态加密（HE）保持机密性。该加密方案允许直接对加密数据进行处理，保证了整个计算过程中的机密性。OGHE的设计专门针对隐私保护应用程序，从一开始就考虑到HE的限制，并引入了一种有效的打包机制，以最大限度地减少HE带来的计算开销。此外，OGHE依赖于一种新的特征选择方法VarScout，该方法旨在通过聚类和发生分析提取最重要的特征，同时保留固有的空间模式。结合VarScout， OGHE与现有的iDash 2020数据集加密癌症分类的隐私保护解决方案进行了比较，证明了它们在提供准确的隐私保护癌症分类方面的有效性，并且由于我们的打包机制减少了延迟。代码被发布到科学界。

{"title":"Enhancing Privacy-Preserving Cancer Classification with Convolutional Neural Networks.","authors":"Aurora A F Colombo, Luca Colombo, Alessandro Falcetta, Manuel Roveri","doi":"10.1142/9789819807024_0040","DOIUrl":"10.1142/9789819807024_0040","url":null,"abstract":"Precision medicine significantly enhances patients prognosis, offering personalized treatments. Particularly for metastatic cancer, incorporating primary tumor location into the diagnostic process greatly improves survival rates. However, traditional methods rely on human expertise, requiring substantial time and financial resources. To address this challenge, Machine Learning (ML) and Deep Learning (DL) have proven particularly effective. Yet, their application to medical data, especially genomic data, must consider and encompass privacy due to the highly sensitive nature of data. In this paper, we propose OGHE, a convolutional neural network-based approach for privacy-preserving cancer classification designed to exploit spatial patterns in genomic data, while maintaining confidentiality by means of Homomorphic Encryption (HE). This encryption scheme allows the processing directly on encrypted data, guaranteeing its confidentiality during the entire computation. The design of OGHE is specific for privacy-preserving applications, taking into account HE limitations from the outset, and introducing an efficient packing mechanism to minimize the computational overhead introduced by HE. Additionally, OGHE relies on a novel feature selection method, VarScout, designed to extract the most significant features through clustering and occurrence analysis, while preserving inherent spatial patterns. Coupled with VarScout, OGHE has been compared with existing privacy-preserving solutions for encrypted cancer classification on the iDash 2020 dataset, demonstrating their effectiveness in providing accurate privacy-preserving cancer classification, and reducing latency thanks to our packing mechanism. The code is released to the scientific community.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"565-579"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics. ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics.

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0014

Oishi Banerjee, Agustina Saenz, Kay Wu, Warren Clements, Adil Zia, Dominic Buensalido, Helen Kavnoudias, Alain S Abi-Ghanem, Nour El Ghawi, Cibele Luna, Patricia Castillo, Khaled Al-Surimi, Rayyan A Daghistani, Yuh-Min Chen, Heng-Sheng Chao, Lars Heiliger, Moon Kim, Johannes Haubold, Frederic Jonske, Pranav Rajpurkar

Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First, our method tests whether a metric is undesirably sensitive to reporting style, providing different scores depending on whether AI-generated reports are stylistically similar to ground-truth reports or not. Second, our method measures whether a metric reliably agrees with experts, or whether metric and expert scores of AI-generated report quality diverge for some sites. Using 240 reports from 6 hospitals around the world, we apply ReXamine-Global to 7 established report evaluation metrics and uncover serious gaps in their generalizability. Developers can apply ReXamine-Global when designing new report evaluation metrics, ensuring their robustness across sites. Additionally, our analysis of existing metrics can guide users of those metrics towards evaluation procedures that work reliably at their sites of interest.

鉴于生成式人工智能放射学模型的能力迅速扩大，需要有可靠的指标来准确衡量不同医院人工智能生成的放射学报告的质量。我们开发了ReXamine-Global，这是一个llm驱动的多站点框架，可以测试不同写作风格和患者群体的指标，揭示其泛化的差距。首先，我们的方法测试一个指标是否对报告风格敏感，根据人工智能生成的报告在风格上是否与基础事实报告相似，提供不同的分数。其次，我们的方法衡量指标是否可靠地与专家一致，或者人工智能生成的报告质量的指标和专家分数是否在某些站点出现分歧。利用来自世界各地6家医院的240份报告，我们将reexamination - global应用于7个已建立的报告评估指标，并发现其普遍性存在严重差距。开发人员可以在设计新的报告评估指标时应用recheck - global，确保它们跨站点的健壮性。此外，我们对现有度量标准的分析可以指导这些度量标准的用户实现在他们感兴趣的站点上可靠地工作的评估过程。

{"title":"ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics.","authors":"Oishi Banerjee, Agustina Saenz, Kay Wu, Warren Clements, Adil Zia, Dominic Buensalido, Helen Kavnoudias, Alain S Abi-Ghanem, Nour El Ghawi, Cibele Luna, Patricia Castillo, Khaled Al-Surimi, Rayyan A Daghistani, Yuh-Min Chen, Heng-Sheng Chao, Lars Heiliger, Moon Kim, Johannes Haubold, Frederic Jonske, Pranav Rajpurkar","doi":"10.1142/9789819807024_0014","DOIUrl":"10.1142/9789819807024_0014","url":null,"abstract":"Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First, our method tests whether a metric is undesirably sensitive to reporting style, providing different scores depending on whether AI-generated reports are stylistically similar to ground-truth reports or not. Second, our method measures whether a metric reliably agrees with experts, or whether metric and expert scores of AI-generated report quality diverge for some sites. Using 240 reports from 6 hospitals around the world, we apply ReXamine-Global to 7 established report evaluation metrics and uncover serious gaps in their generalizability. Developers can apply ReXamine-Global when designing new report evaluation metrics, ensuring their robustness across sites. Additionally, our analysis of existing metrics can guide users of those metrics towards evaluation procedures that work reliably at their sites of interest.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"185-198"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Granularity of the Illnesses-Related Changes in Regional Homogeneity in Major Depressive Disorder using the UKBB Data. 利用 UKBB 数据探索重度抑郁障碍中与疾病相关的地区同质性变化的粒度。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0046

Yewen Huang, Syed Ibrar Hussain, Demetrio Labate, Robert Azencott, Paul Thompson, Bhim Adhikari, Peter Kochunov

Illness related brain effects of neuropsychiatric disorders are not regionally uniform, with some regions showing large pathological effects while others are relatively spared. Presently, Big Data meta-analytic studies tabulate these effects using structural and/or functional brain atlases that are based on the anatomical boundaries, landmarks and connectivity patterns in healthy brains. These patterns are then translated to individual level predictors using approaches such as Regional Vulnerability Index (RVI), which quantifies the agreement between individual brain patterns and the canonical pattern found in the illness. However, the atlases from healthy brains are unlikely to align with deficit pattern expressed in specific disorders such as Major Depressive Disorder (MDD), thus reducing the statistical power for individualized predictions. Here, we evaluated a novel approach, where disorder specific templates are constructed using the Kullback-Leibler (KL) distance to balance granularity, signal-to-noise ratio and the contrast between regional effect sizes to maximize translatability of the population-wide illness pattern at the level of the individual. We used regional homogeneity (ReHo) maps extracted from resting state functional MRI for N = 2, 289 MDD sample (mean age ± s.d.: 63.2 ± 7.2 years) and N = 6104 control subjects (mean age ± s.d.: 62.9 ± 7.2 years) who were free of MDD and any other mental condition. The cortical effects of MDD were analyzed on the 3D spherical surfaces representing cerebral hemispheres. KL-distance was used to organize the cortical surface into 28 regions of interest based on effect sizes, connectivity and signal-to-noise ratio. The RVI values calculated using this novel approach showed significantly higher effect size of the illness than these calculated using standard Desikan brain atlas.

神经精神障碍的疾病相关脑效应并不是区域统一的，一些区域表现出较大的病理效应，而另一些区域相对较少。目前，大数据荟萃分析研究使用基于健康大脑的解剖边界、地标和连接模式的结构和/或功能脑地图集将这些影响制成表格。然后使用区域脆弱性指数（RVI）等方法将这些模式转化为个体水平的预测指标，该指标量化了个体大脑模式与疾病中发现的典型模式之间的一致性。然而，来自健康大脑的图谱不太可能与特定疾病（如重度抑郁症（MDD））中表达的缺陷模式一致，从而降低了个性化预测的统计能力。在这里，我们评估了一种新的方法，其中使用Kullback-Leibler （KL）距离构建疾病特定模板来平衡粒度，信噪比和区域效应大小之间的对比，以最大限度地提高个体水平上人群范围内疾病模式的可翻译性。我们使用静息状态功能MRI提取的区域均匀性（ReHo）图对N = 2,289名MDD样本（平均年龄±s.d: 63.2±7.2岁）和N = 6104名无MDD和任何其他精神疾病的对照受试者（平均年龄±s.d: 62.9±7.2岁）进行分析。在代表大脑半球的三维球面上分析了MDD的皮质效应。基于效应大小、连通性和信噪比，使用KL-distance将皮质表面组织成28个感兴趣的区域。使用这种新方法计算的RVI值显示，与使用标准Desikan脑图谱计算的RVI值相比，疾病的效应值明显更高。

{"title":"Exploring the Granularity of the Illnesses-Related Changes in Regional Homogeneity in Major Depressive Disorder using the UKBB Data.","authors":"Yewen Huang, Syed Ibrar Hussain, Demetrio Labate, Robert Azencott, Paul Thompson, Bhim Adhikari, Peter Kochunov","doi":"10.1142/9789819807024_0046","DOIUrl":"10.1142/9789819807024_0046","url":null,"abstract":"Illness related brain effects of neuropsychiatric disorders are not regionally uniform, with some regions showing large pathological effects while others are relatively spared. Presently, Big Data meta-analytic studies tabulate these effects using structural and/or functional brain atlases that are based on the anatomical boundaries, landmarks and connectivity patterns in healthy brains. These patterns are then translated to individual level predictors using approaches such as Regional Vulnerability Index (RVI), which quantifies the agreement between individual brain patterns and the canonical pattern found in the illness. However, the atlases from healthy brains are unlikely to align with deficit pattern expressed in specific disorders such as Major Depressive Disorder (MDD), thus reducing the statistical power for individualized predictions. Here, we evaluated a novel approach, where disorder specific templates are constructed using the Kullback-Leibler (KL) distance to balance granularity, signal-to-noise ratio and the contrast between regional effect sizes to maximize translatability of the population-wide illness pattern at the level of the individual. We used regional homogeneity (ReHo) maps extracted from resting state functional MRI for N = 2, 289 MDD sample (mean age ± s.d.: 63.2 ± 7.2 years) and N = 6104 control subjects (mean age ± s.d.: 62.9 ± 7.2 years) who were free of MDD and any other mental condition. The cortical effects of MDD were analyzed on the 3D spherical surfaces representing cerebral hemispheres. KL-distance was used to organize the cortical surface into 28 regions of interest based on effect sizes, connectivity and signal-to-noise ratio. The RVI values calculated using this novel approach showed significantly higher effect size of the illness than these calculated using standard Desikan brain atlas.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"647-663"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessment of Drug Impact on Laboratory Test Results in Hospital Settings. 评估药物对医院化验结果的影响。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0026

Victorine P Muse, Amalie D Haue, Cristina L Rodríguez, Alejandro A Orozco, Jorge H Biel, Søren Brunak

Patients experiencing adverse drug events (ADE) from polypharmaceutical regimens present a huge challenge to modern healthcare. While computational efforts may reduce the incidence of these ADEs, current strategies are typically non-generalizable for standard healthcare systems. To address this, we carried out a retrospective study aimed at developing a statistical approach to detect and quantify potential ADEs. The data foundation comprised of almost 2 million patients from two health regions in Denmark and their drug and laboratory data during the years 2011 to 2016. We developed a series of multistate Cox models to compute hazard ratios for changes in laboratory test results before and after drug exposure. By linking the results to data from a drug-drug interaction database, we found that the models showed potential for applications for medical safety agencies and improved efficiency for drug approval pipelines.

多药治疗方案中出现药物不良事件（ADE）的患者对现代医疗保健提出了巨大挑战。虽然计算的努力可能会减少这些不良事件的发生率，但目前的策略通常不能推广到标准的医疗保健系统。为了解决这个问题，我们进行了一项回顾性研究，旨在开发一种统计方法来检测和量化潜在的ade。该数据基础包括2011年至2016年期间来自丹麦两个卫生区域的近200万患者及其药物和实验室数据。我们开发了一系列多状态Cox模型来计算药物暴露前后实验室测试结果变化的风险比。通过将结果与药物-药物相互作用数据库的数据联系起来，我们发现这些模型显示了医疗安全机构应用的潜力，并提高了药物审批流程的效率。

引用次数: 0

All Together Now: Data Work to Advance Privacy, Science, and Health in the Age of Synthetic Data. 现在一起：数据工作在合成数据时代推进隐私、科学和健康。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0049

Lindsay Fernández-Rhodes, Jennifer K Wagner

There is a disconnect between data practices in biomedicine and public understanding of those data practices, and this disconnect is expanding rapidly every day (with the emergence of synthetic data and digital twins and more widely adopted Artificial Intelligence (AI)/Machine Learning tools). Transparency alone is insufficient to bridge this gap. Concurrently, there is an increasingly complex landscape of laws, regulations, and institutional/ programmatic policies to navigate when engaged in biocomputing and digital health research, which makes it increasingly difficult for those wanting to "get it right" or "do the right thing." Mandatory data protection obligations vary widely, sometimes focused on the type of data (and nuanced definition and scope parameters), the actor/entity involved, or the residency of the data subjects. Additional challenges come from attempts to celebrate biocomputing discoveries and digital health innovations, which frequently transform fair and accurate communications into exaggerated hype (e.g., to secure financial investment in future projects or lead to more favorable tenure and promotion decisions). Trust in scientists and scientific expertise can be quickly eroded if, for example, synthetic data is perceived by the public as "fake data" or if digital twins are perceived as "imaginary" patients. Researchers appear increasingly aware of the scientific and moral imperative to strengthen their work and facilitate its sustainability through increased diversity and community engagement. Moreover, there is a growing appreciation for the "data work" necessary to have scientific data become meaningful, actionable information, knowledge, and wisdom-not only for scientists but also for the individuals from whom those data were derived or to whom those data relate. Equity in the process of biocomputing and equity in the distribution of benefits and burdens of biocomputing both demand ongoing development, implementation, and refinement of embedded Ethical, Legal and Social Implications (ELSI) research practices. This workshop is intended to nurture interdisciplinary discussion of these issues and to highlight the skills and competencies all too often considered "soft skills" peripheral to other skills prioritized in traditional training and professional development programs. Data scientists attending this workshop will become better equipped to embed ELSI practices into their research.

生物医学领域的数据实践与公众对这些数据实践的理解之间存在脱节，而且这种脱节每天都在迅速扩大（随着合成数据和数字双胞胎的出现以及更广泛采用的人工智能(AI)/机器学习工具）。仅靠透明度不足以弥合这一差距。与此同时，在从事生物计算和数字健康研究时，法律、法规和机构/计划政策日益复杂，这使得那些想要“把事情做好”或“做正确的事情”的人越来越困难。强制性数据保护义务差别很大，有时侧重于数据类型（以及细微的定义和范围参数）、所涉及的行为者/实体或数据主体的居住地。其他挑战来自庆祝生物计算发现和数字健康创新的尝试，这往往将公平和准确的传播转变为夸大的炒作（例如，确保对未来项目的财政投资或导致更有利的任期和晋升决定）。例如，如果合成数据被公众视为“假数据”，或者数字双胞胎被视为“虚构的”病人，那么对科学家和科学专业知识的信任就会迅速受到侵蚀。研究人员似乎越来越意识到通过增加多样性和社区参与来加强他们的工作并促进其可持续性的科学和道德必要性。此外，越来越多的人认识到，要使科学数据成为有意义的、可操作的信息、知识和智慧，“数据工作”是必要的，这不仅对科学家来说是如此，对获得这些数据或与这些数据相关的个人也是如此。生物计算过程中的公平以及生物计算的利益和负担分配的公平都需要持续发展、实施和改进嵌入的伦理、法律和社会影响（ELSI）研究实践。本次研讨会旨在培养对这些问题的跨学科讨论，并强调那些通常被认为是“软技能”的技能和能力，这些技能和能力在传统培训和专业发展计划中被优先考虑为其他技能的边缘技能。参加本次研讨会的数据科学家将更好地将ELSI实践嵌入到他们的研究中。

{"title":"All Together Now: Data Work to Advance Privacy, Science, and Health in the Age of Synthetic Data.","authors":"Lindsay Fernández-Rhodes, Jennifer K Wagner","doi":"10.1142/9789819807024_0049","DOIUrl":"10.1142/9789819807024_0049","url":null,"abstract":"There is a disconnect between data practices in biomedicine and public understanding of those data practices, and this disconnect is expanding rapidly every day (with the emergence of synthetic data and digital twins and more widely adopted Artificial Intelligence (AI)/Machine Learning tools). Transparency alone is insufficient to bridge this gap. Concurrently, there is an increasingly complex landscape of laws, regulations, and institutional/ programmatic policies to navigate when engaged in biocomputing and digital health research, which makes it increasingly difficult for those wanting to \"get it right\" or \"do the right thing.\" Mandatory data protection obligations vary widely, sometimes focused on the type of data (and nuanced definition and scope parameters), the actor/entity involved, or the residency of the data subjects. Additional challenges come from attempts to celebrate biocomputing discoveries and digital health innovations, which frequently transform fair and accurate communications into exaggerated hype (e.g., to secure financial investment in future projects or lead to more favorable tenure and promotion decisions). Trust in scientists and scientific expertise can be quickly eroded if, for example, synthetic data is perceived by the public as \"fake data\" or if digital twins are perceived as \"imaginary\" patients. Researchers appear increasingly aware of the scientific and moral imperative to strengthen their work and facilitate its sustainability through increased diversity and community engagement. Moreover, there is a growing appreciation for the \"data work\" necessary to have scientific data become meaningful, actionable information, knowledge, and wisdom-not only for scientists but also for the individuals from whom those data were derived or to whom those data relate. Equity in the process of biocomputing and equity in the distribution of benefits and burdens of biocomputing both demand ongoing development, implementation, and refinement of embedded Ethical, Legal and Social Implications (ELSI) research practices. This workshop is intended to nurture interdisciplinary discussion of these issues and to highlight the skills and competencies all too often considered \"soft skills\" peripheral to other skills prioritized in traditional training and professional development programs. Data scientists attending this workshop will become better equipped to embed ELSI practices into their research.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"690-695"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-Species Modeling Identifies Gene Signatures in Type 2 Diabetes Mouse Models Predictive of Inflammatory and Estrogen Signaling Pathways Associated with Alzheimer's Disease Outcomes in Humans. 跨物种模型确定2型糖尿病小鼠模型中的基因特征，预测与人类阿尔茨海默病结局相关的炎症和雌激素信号通路

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0031

Brendan K Ball, Elizabeth A Proctor, Douglas K Brubaker

Alzheimer's disease (AD), the predominant form of dementia, is influenced by several risk factors, including type 2 diabetes (T2D), a metabolic disorder characterized by the dysregulation of blood sugar levels. Despite mouse and human studies reporting this connection between T2D and AD, the mechanism by which T2D contributes to AD pathobiology is not well understood. A challenge in understanding mechanistic links between these conditions is that evidence between mouse and human experimental models must be synthesized, but translating between these systems is difficult due to evolutionary distance, physiological differences, and human heterogeneity. To address this, we employed a computational framework called translatable components regression (TransComp-R) to overcome discrepancies between pre-clinical and clinical studies using omics data. Here, we developed a novel extension of TransComp-R for multi-disease modeling to analyze transcriptomic data from brain samples of mouse models of AD, T2D, and simultaneous occurrence of both disease (ADxT2D) and postmortem human brain data to identify enriched pathways predictive of human AD status. Our TransComp-R model identified inflammatory and estrogen signaling pathways encoded by mouse principal components derived from models of T2D and ADxT2D, but not AD alone, predicted with human AD outcomes. The same mouse PCs predictive of human AD outcomes were able to capture sex-dependent differences in human AD biology, including significant effects unique to female patients, despite the TransComp-R being derived from data from only male mice. We demonstrated that our approach identifies biological pathways of interest at the intersection of the complex etiologies of AD and T2D which may guide future studies into pathogenesis and therapeutic development for patients with T2D-associated AD.

阿尔茨海默病（AD）是痴呆症的主要形式，受多种风险因素的影响，其中包括 2 型糖尿病（T2D），这是一种以血糖水平失调为特征的代谢紊乱。尽管小鼠和人体研究报告了 T2D 与老年痴呆症之间的这种联系，但人们对 T2D 促成老年痴呆症病理生物学的机制还不甚了解。要了解这些病症之间的机理联系所面临的一个挑战是，必须综合小鼠和人类实验模型之间的证据，但由于进化距离、生理差异和人类异质性，在这些系统之间进行转化非常困难。为了解决这个问题，我们采用了一种名为可转化成分回归（TransComp-R）的计算框架，利用omics数据克服临床前研究与临床研究之间的差异。在这里，我们开发了用于多疾病建模的 TransComp-R 的新扩展功能，以分析来自 AD、T2D 和同时发生这两种疾病（ADxT2D）的小鼠模型脑样本的转录组数据以及死后人脑数据，从而确定可预测人类 AD 状态的富集通路。我们的 TransComp-R 模型确定了由小鼠主成分编码的炎症和雌激素信号通路，这些主成分来源于 T2D 和 ADxT2D 模型，但不是单独的 AD 模型，可预测人类 AD 的结果。尽管TransComp-R是根据雄性小鼠的数据得出的，但预测人类AD结果的相同小鼠主成分能够捕捉到人类AD生物学中的性别差异，包括女性患者特有的显著效应。我们的研究表明，我们的方法可以识别出在注意力缺失症和 T2D 复杂病因交叉点上的生物通路，这些通路可以指导未来对 T2D 相关注意力缺失症患者的发病机制和疗法开发的研究。

{"title":"Cross-Species Modeling Identifies Gene Signatures in Type 2 Diabetes Mouse Models Predictive of Inflammatory and Estrogen Signaling Pathways Associated with Alzheimer's Disease Outcomes in Humans.","authors":"Brendan K Ball, Elizabeth A Proctor, Douglas K Brubaker","doi":"10.1142/9789819807024_0031","DOIUrl":"10.1142/9789819807024_0031","url":null,"abstract":"Alzheimer's disease (AD), the predominant form of dementia, is influenced by several risk factors, including type 2 diabetes (T2D), a metabolic disorder characterized by the dysregulation of blood sugar levels. Despite mouse and human studies reporting this connection between T2D and AD, the mechanism by which T2D contributes to AD pathobiology is not well understood. A challenge in understanding mechanistic links between these conditions is that evidence between mouse and human experimental models must be synthesized, but translating between these systems is difficult due to evolutionary distance, physiological differences, and human heterogeneity. To address this, we employed a computational framework called translatable components regression (TransComp-R) to overcome discrepancies between pre-clinical and clinical studies using omics data. Here, we developed a novel extension of TransComp-R for multi-disease modeling to analyze transcriptomic data from brain samples of mouse models of AD, T2D, and simultaneous occurrence of both disease (ADxT2D) and postmortem human brain data to identify enriched pathways predictive of human AD status. Our TransComp-R model identified inflammatory and estrogen signaling pathways encoded by mouse principal components derived from models of T2D and ADxT2D, but not AD alone, predicted with human AD outcomes. The same mouse PCs predictive of human AD outcomes were able to capture sex-dependent differences in human AD biology, including significant effects unique to female patients, despite the TransComp-R being derived from data from only male mice. We demonstrated that our approach identifies biological pathways of interest at the intersection of the complex etiologies of AD and T2D which may guide future studies into pathogenesis and therapeutic development for patients with T2D-associated AD.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"426-440"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12674991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session Introduction: Overcoming health disparities in precision medicine. 会议简介：克服精准医疗中的健康差距。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Francisco M De La Vega, Kathleen C Barnes, Keolu Fox, Alexander Ioannidis, Eimear Kenny, Rasika A Mathias, Bogdan Pasaniuc

The following sections are included:OverviewDealing with the lack of diversity in current research datasetsDevelopment of fair machine learning algorithmsRace, genetic ancestry, and population structureConclusionAcknowledgments.

包括以下部分：概述处理当前研究数据集缺乏多样性的问题开发公平的机器学习算法种族、遗传祖先和种群结构结论致谢。

引用次数: 0

Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers. 数字生物标记物的主题协调：从语言标记改进对轻度认知障碍的检测。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Bao Hoang, Yijiang Pang, Hiroko H Dodge, Jiayu Zhou

Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.

轻度认知障碍（MCI）是包括阿尔茨海默病（AD）在内的痴呆症的早期阶段，也是治疗干预和治疗的关键阶段。早期发现 MCI 可为早期干预提供机会，并极大地丰富临床试验的队列。血浆和脑脊液生物标记物中的成像和活体标记物具有很高的检测性能，但其高昂的成本和侵扰性要求有更实惠、更易获得的替代品。数字生物标志物，尤其是语言标志物的最新进展显示出巨大的潜力，这些标志物从语言和/或语音中提取出与 MCI 相关的变量，然后用于预测建模。语言标记建模的一大挑战来自于每个人说话方式的多变性。由于大量的数据收集工作，语言研究的队列规模通常较小，人与人之间的可变性使得语言标记很难推广到未见过的受试者。在本文中，我们提出了一种新颖的受试者协调工具，以解决不同受试者之间语言标记分布差异的问题，从而提高机器学习模型的泛化性能。我们的实证结果表明，基于我们协调过的特征建立的机器学习模型在未见数据上的预测性能有所提高。源代码和实验脚本见 https://github.com/illidanlab/subject_harmonization。

{"title":"Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers.","authors":"Bao Hoang, Yijiang Pang, Hiroko H Dodge, Jiayu Zhou","doi":"","DOIUrl":"","url":null,"abstract":"Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"187-200"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BrainSTEAM: A Practical Pipeline for Connectome-based fMRI Analysis towards Subject Classification. BrainSTEAM：基于连接组的 fMRI 分析的实用管道，实现受试者分类。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Alexis Li, Yi Yang, Hejie Cui, Carl Yang

Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.

大脑功能网络代表了解剖学感兴趣区（ROIs）之间动态而复杂的相互作用，为神经模式发现和疾病诊断提供了重要的临床见解。近年来，图神经网络（GNN）在分析结构化网络数据方面取得了巨大的成功和成效。然而，由于数据获取的高复杂性，导致神经影像数据的训练资源有限，图神经网络和所有深度学习模型一样，都存在过度拟合的问题。此外，它们捕捉有用神经模式进行下游预测的能力也受到了不利影响。为了应对这一挑战，本研究提出了 BrainSTEAM，这是一个具有时空模块的集成框架，由 EdgeConv GNN 模型、自动编码器网络和混合策略组成。其中，时空模块旨在将每个受试者的 ROI 特征的时间序列信号动态分割成块序列。我们利用每个序列来构建相关网络，从而增加训练数据。此外，我们还使用 EdgeConv GNN 捕捉 ROI 连接结构，使用自动编码器进行数据去噪，并使用 mixup 通过线性数据增强来加强模型训练。我们在两个真实世界的神经成像数据集上对我们的框架进行了评估，一个是用于自闭症预测的 ABIDE 数据集，另一个是用于性别预测的 HCP 数据集。广泛的实验证明了 BrainSTEAM 与各种现有模型相比的优越性和鲁棒性，展示了我们提出的机制在推广到其他基于连接体的 fMRI 分析研究中的强大潜力。

{"title":"BrainSTEAM: A Practical Pipeline for Connectome-based fMRI Analysis towards Subject Classification.","authors":"Alexis Li, Yi Yang, Hejie Cui, Carl Yang","doi":"","DOIUrl":"","url":null,"abstract":"Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"53-64"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0