首页 > 最新文献

Patterns最新文献

英文 中文
FRAMM: Fair ranking with missing modalities for clinical trial site selection FRAMM:临床试验地点选择的公平排名与缺失模式
IF 6.5 Q2 Decision Sciences Pub Date : 2024-03-01 DOI: 10.1016/j.patter.2024.100944
Brandon Theodorou, Lucas Glass, Cao Xiao, Jimeng Sun
The underrepresentation of gender, racial, and ethnic minorities in clinical trials is a problem undermining the efficacy of treatments on minorities and preventing precise estimates of the effects within these subgroups. We propose , a deep reinforcement learning framework for fair trial site selection to help address this problem. We focus on two real-world challenges: the data modalities used to guide selection are often incomplete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity. To address the missing data challenge, has a modality encoder with a masked cross-attention mechanism for bypassing missing data. To make efficient trade-offs, uses deep reinforcement learning with a reward function designed to simultaneously optimize for both enrollment and fairness. We evaluate using real-world historical clinical trials and show that it outperforms the leading baseline in enrollment-only settings while also greatly improving diversity.
性别、种族和民族少数群体在临床试验中的代表性不足是一个问题,它削弱了治疗对少数群体的疗效,并阻碍了对这些亚群效果的精确估计。我们提出了一个用于公平试验选址的深度强化学习框架,以帮助解决这一问题。我们将重点放在两个现实世界的挑战上:用于指导选择的数据模式对于许多潜在的试验点来说往往是不完整的,而试验点的选择需要同时对入学率和多样性进行优化。为了解决数据缺失的难题,我们采用了一种具有屏蔽交叉关注机制的模态编码器,以绕过缺失数据。为了进行有效权衡,我们使用了深度强化学习,其奖励函数旨在同时优化入学率和公平性。我们利用真实世界的历史临床试验进行了评估,结果表明,在仅招生的情况下,它的性能优于领先的基线,同时还大大提高了多样性。
{"title":"FRAMM: Fair ranking with missing modalities for clinical trial site selection","authors":"Brandon Theodorou, Lucas Glass, Cao Xiao, Jimeng Sun","doi":"10.1016/j.patter.2024.100944","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100944","url":null,"abstract":"The underrepresentation of gender, racial, and ethnic minorities in clinical trials is a problem undermining the efficacy of treatments on minorities and preventing precise estimates of the effects within these subgroups. We propose , a deep reinforcement learning framework for fair trial site selection to help address this problem. We focus on two real-world challenges: the data modalities used to guide selection are often incomplete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity. To address the missing data challenge, has a modality encoder with a masked cross-attention mechanism for bypassing missing data. To make efficient trade-offs, uses deep reinforcement learning with a reward function designed to simultaneously optimize for both enrollment and fairness. We evaluate using real-world historical clinical trials and show that it outperforms the leading baseline in enrollment-only settings while also greatly improving diversity.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140107006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated learning for multi-omics: A performance evaluation in Parkinson’s disease 多组学联合学习:帕金森病的性能评估
IF 6.5 Q2 Decision Sciences Pub Date : 2024-03-01 DOI: 10.1016/j.patter.2024.100945
Benjamin P. Danek, Mary B. Makarious, Anant Dadu, Dan Vitale, Paul Suhwan Lee, Andrew B. Singleton, Mike A. Nalls, Jimeng Sun, Faraz Faghri
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson’s disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.
虽然机器学习(ML)研究近来越来越受欢迎,但其在全生命科学领域的应用却受限于获取训练 ML 模型所需的足够大的高质量数据集。联盟学习(FL)为参与机构之间合作整理此类数据集提供了机会。我们比较了在多组学帕金森病预测任务中使用 FL 训练的几个模型与经典训练的 ML 模型的模拟性能。我们发现,FL 模型的性能跟踪了集中训练的 ML 模型,其中性能最好的 FL 模型的 AUC-PR 为 0.876 ± 0.009,比其集中训练的变体低 0.014 ± 0.003。我们还确定,联盟内样本的分散性对模型性能有重要影响。我们的研究实施了几个开源的 FL 框架,旨在强调在多组学研究中应用这些协作方法时所面临的一些挑战和机遇。
{"title":"Federated learning for multi-omics: A performance evaluation in Parkinson’s disease","authors":"Benjamin P. Danek, Mary B. Makarious, Anant Dadu, Dan Vitale, Paul Suhwan Lee, Andrew B. Singleton, Mike A. Nalls, Jimeng Sun, Faraz Faghri","doi":"10.1016/j.patter.2024.100945","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100945","url":null,"abstract":"While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson’s disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140047844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting drug response through tumor deconvolution by cancer cell lines 通过癌细胞系的肿瘤解旋预测药物反应
IF 6.5 Q2 Decision Sciences Pub Date : 2024-03-01 DOI: 10.1016/j.patter.2024.100949
Yu-Ching Hsu, Yu-Chiao Chiu, Tzu-Pin Lu, Tzu-Hung Hsiao, Yidong Chen
{"title":"Predicting drug response through tumor deconvolution by cancer cell lines","authors":"Yu-Ching Hsu, Yu-Chiao Chiu, Tzu-Pin Lu, Tzu-Hung Hsiao, Yidong Chen","doi":"10.1016/j.patter.2024.100949","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100949","url":null,"abstract":"","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140088881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An evaluation of synthetic data augmentation for mitigating covariate bias in health data 评估合成数据扩增以减少健康数据中的协变量偏差
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-29 DOI: 10.1016/j.patter.2024.100946
Lamin Juwara, Alaa El-Hussuna, Khaled El Emam
Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.
数据偏差是生物医学研究中的一个主要问题,尤其是在评估大规模观测数据集时。它导致标准回归模型中不精确的预测和不一致的估计。我们比较了常用的消除偏差方法(重采样、算法和事后方法)与合成数据扩增方法的性能,后者利用序列提升决策树来合成代表性不足的群体。这种方法被称为合成少数群体增强法(SMA)。通过模拟和分析逻辑回归工作负载上的真实健康数据集,在各种偏差情况(类型和严重程度)下对这些方法进行了评估。性能评估基于曲线下面积、校准(布赖尔评分)、参数估计精度、置信区间重叠和公平性。总体而言,在中低偏差(50% 或更低的缺失比例)情况下,SMA 得出的结果最接近地面实况。而在高偏差(80% 或以上的缺失比例)情况下,SMA 的优势并不明显,没有一种特定的方法始终优于其他方法。
{"title":"An evaluation of synthetic data augmentation for mitigating covariate bias in health data","authors":"Lamin Juwara, Alaa El-Hussuna, Khaled El Emam","doi":"10.1016/j.patter.2024.100946","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100946","url":null,"abstract":"Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models TCGA-Reports:用于对基于文本的人工智能模型进行基准测试的机器可读病理报告资源
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-21 DOI: 10.1016/j.patter.2024.100933
Jenna Kefeli, Nicholas Tatonetti
{"title":"TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models","authors":"Jenna Kefeli, Nicholas Tatonetti","doi":"10.1016/j.patter.2024.100933","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100933","url":null,"abstract":"","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Propagating variational model uncertainty for bioacoustic call label smoothing 传播变异模型不确定性以平滑生物声学呼叫标签
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-12 DOI: 10.1016/j.patter.2024.100932
Georgios Rizos, Jenna Lawson, Simon Mitchell, Pranay Shah, Xin Wen, Cristina Banks-Leite, Robert Ewers, Björn W. Schuller
Along with propagating the input toward making a prediction, Bayesian neural networks also propagate uncertainty. This has the potential to guide the training process by rejecting predictions of low confidence, and recent variational Bayesian methods can do so without Monte Carlo sampling of weights. Here, we apply sample-free methods for wildlife call detection on recordings made via passive acoustic monitoring equipment in the animals’ natural habitats. We further propose uncertainty-aware label smoothing, where the smoothing probability is dependent on sample-free predictive uncertainty, in order to downweigh data samples that should contribute less to the loss value. We introduce a bioacoustic dataset recorded in Malaysian Borneo, containing overlapping calls from 30 species. On that dataset, our proposed method achieves an absolute percentage improvement of around 1.5 points on area under the receiver operating characteristic (AU-ROC), 13 points in F1, and 19.5 points in expected calibration error (ECE) compared to the point-estimate network baseline averaged across all target classes.
贝叶斯神经网络在传播预测输入的同时,也传播不确定性。这有可能通过拒绝置信度低的预测来指导训练过程,而最新的变异贝叶斯方法可以在不对权重进行蒙特卡罗采样的情况下做到这一点。在这里,我们将无抽样方法应用于野生动物叫声检测,该方法是通过动物自然栖息地的被动声学监测设备采集的录音。我们进一步提出了不确定性感知标签平滑法,其中平滑概率取决于无样本预测的不确定性,以降低对损失值贡献较小的数据样本的权重。我们介绍了在马来西亚婆罗洲记录的生物声学数据集,其中包含 30 个物种的重叠叫声。在该数据集上,与所有目标类别平均的点估计网络基线相比,我们提出的方法在接收器工作特征下面积 (AU-ROC) 方面实现了约 1.5 点的绝对百分比改进,在 F1 方面实现了 13 点的改进,在预期校准误差 (ECE) 方面实现了 19.5 点的改进。
{"title":"Propagating variational model uncertainty for bioacoustic call label smoothing","authors":"Georgios Rizos, Jenna Lawson, Simon Mitchell, Pranay Shah, Xin Wen, Cristina Banks-Leite, Robert Ewers, Björn W. Schuller","doi":"10.1016/j.patter.2024.100932","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100932","url":null,"abstract":"Along with propagating the input toward making a prediction, Bayesian neural networks also propagate uncertainty. This has the potential to guide the training process by rejecting predictions of low confidence, and recent variational Bayesian methods can do so without Monte Carlo sampling of weights. Here, we apply sample-free methods for wildlife call detection on recordings made via passive acoustic monitoring equipment in the animals’ natural habitats. We further propose uncertainty-aware label smoothing, where the smoothing probability is dependent on sample-free predictive uncertainty, in order to downweigh data samples that should contribute less to the loss value. We introduce a bioacoustic dataset recorded in Malaysian Borneo, containing overlapping calls from 30 species. On that dataset, our proposed method achieves an absolute percentage improvement of around 1.5 points on area under the receiver operating characteristic (AU-ROC), 13 points in F1, and 19.5 points in expected calibration error (ECE) compared to the point-estimate network baseline averaged across all target classes.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social construction of XAI: Do we need one definition to rule them all? XAI 的社会建构:我们是否需要一个定义来统领一切?
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-09 DOI: 10.1016/j.patter.2024.100926
Upol Ehsan, Mark O Riedl

In this opinion, Upol Ehsan and Mark Riedl argue why a singular monolithic definition of explainable AI (XAI) is neither feasible nor desirable at this stage of XAI's development.

在这篇观点中,Upol Ehsan 和 Mark Riedl 论证了为什么在 XAI 发展的现阶段,对可解释人工智能(XAI)进行单一的定义既不可行,也不可取。
{"title":"Social construction of XAI: Do we need one definition to rule them all?","authors":"Upol Ehsan, Mark O Riedl","doi":"10.1016/j.patter.2024.100926","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100926","url":null,"abstract":"<p><p>In this opinion, Upol Ehsan and Mark Riedl argue why a singular monolithic definition of explainable AI (XAI) is neither feasible nor desirable at this stage of XAI's development.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10873153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139900592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The DIRAC framework: Geometric structure underlies roles of diversity and accuracy in combining classifiers DIRAC 框架:几何结构是多样性和准确性在组合分类器中发挥作用的基础
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-05 DOI: 10.1016/j.patter.2024.100924
Matthew J. Sniatynski, John A. Shepherd, Lynne R. Wilkens, D. Frank Hsu, Bruce S. Kristal

Combining classification systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Similar to improving binary classification with fusion, fusing ranking systems most commonly increases Pearson or Spearman correlations with a target when the input classifiers are “sufficiently good” (generalized as “accuracy”) and “sufficiently different” (generalized as “diversity”), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Building on our previous empirical work establishing the DIRAC (DIversity of Ranks and ACcuracy) framework, which accurately predicts the outcome of fusing binary classifiers, we demonstrate that the DIRAC framework similarly explains the outcome of fusing ranking systems. Specifically, precise geometric representation of diversity and accuracy as angle-based distances within rank-based combinatorial structures (permutahedra) fully captures their synergistic roles in rank approximation, uncouples them from the specific metrics of a given problem, and represents them as generally as possible.

融合分类系统有可能提高预测准确度,但结果却无法预测。与利用融合改进二元分类类似,当输入分类器 "足够好"(概括为 "准确性")和 "足够不同"(概括为 "多样性")时,融合排序系统通常会提高与目标的皮尔逊或斯皮尔曼相关性,但这些因素对最终结果的单独和联合定量影响仍然未知。我们将解决这些问题。我们以前的实证工作建立了 DIRAC(等级和准确度的反差)框架,该框架能准确预测二元分类器的融合结果,在此基础上,我们证明 DIRAC 框架同样能解释排名系统的融合结果。具体来说,在基于等级的组合结构(permutahedra)中,将多样性和准确性精确地几何表示为基于角度的距离,充分体现了它们在等级近似中的协同作用,使它们与给定问题的特定指标脱钩,并尽可能普遍地表示它们。
{"title":"The DIRAC framework: Geometric structure underlies roles of diversity and accuracy in combining classifiers","authors":"Matthew J. Sniatynski, John A. Shepherd, Lynne R. Wilkens, D. Frank Hsu, Bruce S. Kristal","doi":"10.1016/j.patter.2024.100924","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100924","url":null,"abstract":"<p>Combining classification systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Similar to improving binary classification with fusion, fusing ranking systems most commonly increases Pearson or Spearman correlations with a target when the input classifiers are “sufficiently good” (generalized as “<span><em><strong>accuracy</strong></em></span>”) and “sufficiently different” (generalized as “<span><em><strong>diversity</strong></em></span>”), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Building on our previous empirical work establishing the DIRAC (<em>DI</em><span><em>versity</em></span> of Ranks and <em>AC</em><span><em>curacy</em></span>) framework, which accurately predicts the outcome of fusing binary classifiers, we demonstrate that the DIRAC framework similarly explains the outcome of fusing ranking systems. Specifically, precise geometric representation of <span><em><strong>diversity</strong></em></span> and <span><em><strong>accuracy</strong></em></span> as angle-based distances within rank-based combinatorial structures (permutahedra) fully captures their synergistic roles in rank approximation, uncouples them from the specific metrics of a given problem, and represents them as generally as possible.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139755942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data 从 ChIP-exo 数据中识别主题的加权两阶段序列比对框架
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-02 DOI: 10.1016/j.patter.2024.100927
Yang Li, Yizhong Wang, Cankun Wang, Anjun Ma, Qin Ma, Bingqiang Liu

In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a “bookend” model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA’s performance against seven established tools. The results indicate TESA’s improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.

在这项研究中,我们介绍了 TESA(加权两阶段比对),这是一种创新的主题预测工具,它能完善 DNA 结合蛋白主题的识别,这对破译转录调控机制至关重要。与仅依赖序列数据的传统算法不同,TESA 通过为序列位置分配权重,整合了高分辨率染色质免疫沉淀(ChIP)信号,特别是来自 ChIP-exonuclease(ChIP-exo)的信号,从而提高了主题发现的能力。TESA 采用了一种细致入微的方法,将二项分布模型与图形模型相结合,并辅以 "书尾 "模型,从而提高了预测不同长度主题的准确性。我们利用来自 proChIPdb 的 90 个原核生物 ChIP-exo 数据集和 167 个智人数据集的广泛汇编进行了评估,将 TESA 的性能与七种成熟工具进行了比较。结果表明 TESA 提高了主题识别的精确度,这表明它在基因组研究领域做出了宝贵的贡献。
{"title":"A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data","authors":"Yang Li, Yizhong Wang, Cankun Wang, Anjun Ma, Qin Ma, Bingqiang Liu","doi":"10.1016/j.patter.2024.100927","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100927","url":null,"abstract":"<p>In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a “bookend” model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 <em>H</em>. <em>sapiens</em> datasets, compared TESA’s performance against seven established tools. The results indicate TESA’s improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139669725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedEYE: A scalable and flexible end-to-end federated learning platform for ophthalmology FedEYE:可扩展、灵活的端到端眼科联合学习平台
IF 6.5 Q2 Decision Sciences Pub Date : 2024-02-02 DOI: 10.1016/j.patter.2024.100928
Bingjie Yan, Danmin Cao, Xinlong Jiang, Yiqiang Chen, Weiwei Dai, Fan Dong, Wuliang Huang, Teng Zhang, Chenlong Gao, Qian Chen, Zhen Yan, Zhirui Wang

Data-driven machine learning, as a promising approach, possesses the capability to build high-quality, exact, and robust models from ophthalmic medical data. Ophthalmic medical data, however, presently exist across disparate data silos with privacy limitations, making centralized training challenging. While ophthalmologists may not specialize in machine learning and artificial intelligence (AI), considerable impediments arise in the associated realm of research. To address these issues, we design and develop FedEYE, a scalable and flexible end-to-end ophthalmic federated learning platform. During FedEYE design, we adhere to four fundamental design principles, ensuring that ophthalmologists can effortlessly create independent and federated AI research tasks. Benefiting from the design principles and architecture of FedEYE, it encloses numerous key features, including rich and customizable capabilities, separation of concerns, scalability, and flexible deployment. We also validated the applicability of FedEYE by employing several prevalent neural networks on ophthalmic disease image classification tasks.

数据驱动的机器学习作为一种前景广阔的方法,有能力从眼科医疗数据中建立高质量、精确和稳健的模型。然而,眼科医疗数据目前存在于不同的数据孤岛中,存在隐私限制,这使得集中培训具有挑战性。虽然眼科医生可能并不擅长机器学习和人工智能(AI),但在相关的研究领域却存在相当大的障碍。为了解决这些问题,我们设计并开发了一个可扩展、灵活的端到端眼科联合学习平台 FedEYE。在 FedEYE 的设计过程中,我们坚持四项基本设计原则,确保眼科医生能够轻松创建独立的联合人工智能研究任务。得益于 FedEYE 的设计原则和架构,它拥有众多关键功能,包括丰富的可定制功能、关注点分离、可扩展性和灵活部署。我们还在眼科疾病图像分类任务中使用了几种流行的神经网络,从而验证了 FedEYE 的适用性。
{"title":"FedEYE: A scalable and flexible end-to-end federated learning platform for ophthalmology","authors":"Bingjie Yan, Danmin Cao, Xinlong Jiang, Yiqiang Chen, Weiwei Dai, Fan Dong, Wuliang Huang, Teng Zhang, Chenlong Gao, Qian Chen, Zhen Yan, Zhirui Wang","doi":"10.1016/j.patter.2024.100928","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100928","url":null,"abstract":"<p>Data-driven machine learning, as a promising approach, possesses the capability to build high-quality, exact, and robust models from ophthalmic medical data. Ophthalmic medical data, however, presently exist across disparate data silos with privacy limitations, making centralized training challenging. While ophthalmologists may not specialize in machine learning and artificial intelligence (AI), considerable impediments arise in the associated realm of research. To address these issues, we design and develop FedEYE, a scalable and flexible end-to-end ophthalmic federated learning platform. During FedEYE design, we adhere to four fundamental design principles, ensuring that ophthalmologists can effortlessly create independent and federated AI research tasks. Benefiting from the design principles and architecture of FedEYE, it encloses numerous key features, including rich and customizable capabilities, separation of concerns, scalability, and flexible deployment. We also validated the applicability of FedEYE by employing several prevalent neural networks on ophthalmic disease image classification tasks.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139669718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Patterns
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1