首页 > 最新文献

Patterns最新文献

英文 中文
A comprehensive benchmark for COVID-19 predictive modeling using electronic health records in intensive care 利用重症监护中的电子健康记录建立 COVID-19 预测模型的综合基准
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-07 DOI: 10.1016/j.patter.2024.100951
Junyi Gao, Yinghao Zhu, Wenqing Wang, Zixiang Wang, Guiying Dong, Wen Tang, Hao Wang, Yasha Wang, Ewen M. Harrison, Liantao Ma
The COVID-19 pandemic highlighted the need for predictive deep-learning models in health care. However, practical prediction task design, fair comparison, and model selection for clinical applications remain a challenge. To address this, we introduce and evaluate two new prediction tasks—outcome-specific length-of-stay and early-mortality prediction for COVID-19 patients in intensive care—which better reflect clinical realities. We developed evaluation metrics, model adaptation designs, and open-source data preprocessing pipelines for these tasks while also evaluating 18 predictive models, including clinical scoring methods and traditional machine-learning, basic deep-learning, and advanced deep-learning models, tailored for electronic health record (EHR) data. Benchmarking results from two real-world COVID-19 EHR datasets are provided, and all results and trained models have been released on an online platform for use by clinicians and researchers. Our efforts contribute to the advancement of deep-learning and machine-learning research in pandemic predictive modeling.
COVID-19 大流行凸显了医疗保健领域对预测性深度学习模型的需求。然而,临床应用中的实际预测任务设计、公平比较和模型选择仍然是一项挑战。为了解决这个问题,我们引入并评估了两个新的预测任务--COVID-19 重症监护患者的特定住院时间和早期死亡预测--这两个任务更好地反映了临床实际情况。我们为这些任务开发了评估指标、模型适配设计和开源数据预处理管道,同时还评估了 18 种预测模型,包括临床评分方法、传统机器学习模型、基础深度学习模型和高级深度学习模型,这些模型都是为电子健康记录(EHR)数据量身定制的。我们提供了两个真实世界 COVID-19 EHR 数据集的基准测试结果,并在一个在线平台上发布了所有结果和训练模型,供临床医生和研究人员使用。我们的努力有助于推动大流行病预测建模方面的深度学习和机器学习研究。
{"title":"A comprehensive benchmark for COVID-19 predictive modeling using electronic health records in intensive care","authors":"Junyi Gao, Yinghao Zhu, Wenqing Wang, Zixiang Wang, Guiying Dong, Wen Tang, Hao Wang, Yasha Wang, Ewen M. Harrison, Liantao Ma","doi":"10.1016/j.patter.2024.100951","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100951","url":null,"abstract":"The COVID-19 pandemic highlighted the need for predictive deep-learning models in health care. However, practical prediction task design, fair comparison, and model selection for clinical applications remain a challenge. To address this, we introduce and evaluate two new prediction tasks—outcome-specific length-of-stay and early-mortality prediction for COVID-19 patients in intensive care—which better reflect clinical realities. We developed evaluation metrics, model adaptation designs, and open-source data preprocessing pipelines for these tasks while also evaluating 18 predictive models, including clinical scoring methods and traditional machine-learning, basic deep-learning, and advanced deep-learning models, tailored for electronic health record (EHR) data. Benchmarking results from two real-world COVID-19 EHR datasets are provided, and all results and trained models have been released on an online platform for use by clinicians and researchers. Our efforts contribute to the advancement of deep-learning and machine-learning research in pandemic predictive modeling.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"65 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140586352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can large language models reason about medical questions? 大型语言模型能否推理医学问题?
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-01 DOI: 10.1016/j.patter.2024.100943
Valentin Liévin, Christoffer Egeberg Hother, Andreas Geert Motzfeldt, Ole Winther
Although large language models often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether closed- and open-source models (GPT-3.5, Llama 2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-US Medical Licensing Examination [USMLE], MedMCQA, and PubMedQA) and multiple prompting scenarios: chain of thought (CoT; think step by step), few shot, and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason, and recall expert knowledge. Last, by leveraging advances in prompt engineering (few-shot and ensemble methods), we demonstrated that GPT-3.5 not only yields calibrated predictive distributions but also reaches the passing score on three datasets: MedQA-USMLE (60.2%), MedMCQA (62.7%), and PubMedQA (78.2%). Open-source models are closing the gap: Llama 2 70B also passed the MedQA-USMLE with 62.5% accuracy.
尽管大型语言模型通常能产生令人印象深刻的输出结果,但它们在需要强大推理技能和专家领域知识的现实世界场景中的表现如何,目前仍不清楚。我们着手研究封闭和开源模型(GPT-3.5、Llama 2 等)是否可用于回答和推理基于真实世界的难题。我们重点研究了三种流行的医学基准(MedQA-美国医学执业资格考试[USMLE]、MedMCQA 和 PubMedQA)和多种提示情景:思维链(CoT;逐步思考)、少数几个镜头和检索增强。根据专家对生成的 CoT 的注释,我们发现 InstructGPT 通常可以阅读、推理和回忆专家知识。最后,通过利用及时工程学的进步(少射和集合方法),我们证明了 GPT-3.5 不仅能生成校准预测分布,还能在三个数据集上达到及格分数:MedQA-USMLE(60.2%)、MedMCQA(62.7%)和 PubMedQA(78.2%)。开源模型正在缩小差距:Llama 2 70B 也以 62.5% 的准确率通过了 MedQA-USMLE 考试。
{"title":"Can large language models reason about medical questions?","authors":"Valentin Liévin, Christoffer Egeberg Hother, Andreas Geert Motzfeldt, Ole Winther","doi":"10.1016/j.patter.2024.100943","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100943","url":null,"abstract":"Although large language models often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether closed- and open-source models (GPT-3.5, Llama 2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-US Medical Licensing Examination [USMLE], MedMCQA, and PubMedQA) and multiple prompting scenarios: chain of thought (CoT; think step by step), few shot, and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason, and recall expert knowledge. Last, by leveraging advances in prompt engineering (few-shot and ensemble methods), we demonstrated that GPT-3.5 not only yields calibrated predictive distributions but also reaches the passing score on three datasets: MedQA-USMLE (60.2%), MedMCQA (62.7%), and PubMedQA (78.2%). Open-source models are closing the gap: Llama 2 70B also passed the MedQA-USMLE with 62.5% accuracy.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"19 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FRAMM: Fair ranking with missing modalities for clinical trial site selection FRAMM:临床试验地点选择的公平排名与缺失模式
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-01 DOI: 10.1016/j.patter.2024.100944
Brandon Theodorou, Lucas Glass, Cao Xiao, Jimeng Sun
The underrepresentation of gender, racial, and ethnic minorities in clinical trials is a problem undermining the efficacy of treatments on minorities and preventing precise estimates of the effects within these subgroups. We propose , a deep reinforcement learning framework for fair trial site selection to help address this problem. We focus on two real-world challenges: the data modalities used to guide selection are often incomplete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity. To address the missing data challenge, has a modality encoder with a masked cross-attention mechanism for bypassing missing data. To make efficient trade-offs, uses deep reinforcement learning with a reward function designed to simultaneously optimize for both enrollment and fairness. We evaluate using real-world historical clinical trials and show that it outperforms the leading baseline in enrollment-only settings while also greatly improving diversity.
性别、种族和民族少数群体在临床试验中的代表性不足是一个问题,它削弱了治疗对少数群体的疗效,并阻碍了对这些亚群效果的精确估计。我们提出了一个用于公平试验选址的深度强化学习框架,以帮助解决这一问题。我们将重点放在两个现实世界的挑战上:用于指导选择的数据模式对于许多潜在的试验点来说往往是不完整的,而试验点的选择需要同时对入学率和多样性进行优化。为了解决数据缺失的难题,我们采用了一种具有屏蔽交叉关注机制的模态编码器,以绕过缺失数据。为了进行有效权衡,我们使用了深度强化学习,其奖励函数旨在同时优化入学率和公平性。我们利用真实世界的历史临床试验进行了评估,结果表明,在仅招生的情况下,它的性能优于领先的基线,同时还大大提高了多样性。
{"title":"FRAMM: Fair ranking with missing modalities for clinical trial site selection","authors":"Brandon Theodorou, Lucas Glass, Cao Xiao, Jimeng Sun","doi":"10.1016/j.patter.2024.100944","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100944","url":null,"abstract":"The underrepresentation of gender, racial, and ethnic minorities in clinical trials is a problem undermining the efficacy of treatments on minorities and preventing precise estimates of the effects within these subgroups. We propose , a deep reinforcement learning framework for fair trial site selection to help address this problem. We focus on two real-world challenges: the data modalities used to guide selection are often incomplete for many potential trial sites, and the site selection needs to simultaneously optimize for both enrollment and diversity. To address the missing data challenge, has a modality encoder with a masked cross-attention mechanism for bypassing missing data. To make efficient trade-offs, uses deep reinforcement learning with a reward function designed to simultaneously optimize for both enrollment and fairness. We evaluate using real-world historical clinical trials and show that it outperforms the leading baseline in enrollment-only settings while also greatly improving diversity.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"156 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140107006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated learning for multi-omics: A performance evaluation in Parkinson’s disease 多组学联合学习:帕金森病的性能评估
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-03-01 DOI: 10.1016/j.patter.2024.100945
Benjamin P. Danek, Mary B. Makarious, Anant Dadu, Dan Vitale, Paul Suhwan Lee, Andrew B. Singleton, Mike A. Nalls, Jimeng Sun, Faraz Faghri
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson’s disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.
虽然机器学习(ML)研究近来越来越受欢迎,但其在全生命科学领域的应用却受限于获取训练 ML 模型所需的足够大的高质量数据集。联盟学习(FL)为参与机构之间合作整理此类数据集提供了机会。我们比较了在多组学帕金森病预测任务中使用 FL 训练的几个模型与经典训练的 ML 模型的模拟性能。我们发现,FL 模型的性能跟踪了集中训练的 ML 模型,其中性能最好的 FL 模型的 AUC-PR 为 0.876 ± 0.009,比其集中训练的变体低 0.014 ± 0.003。我们还确定,联盟内样本的分散性对模型性能有重要影响。我们的研究实施了几个开源的 FL 框架,旨在强调在多组学研究中应用这些协作方法时所面临的一些挑战和机遇。
{"title":"Federated learning for multi-omics: A performance evaluation in Parkinson’s disease","authors":"Benjamin P. Danek, Mary B. Makarious, Anant Dadu, Dan Vitale, Paul Suhwan Lee, Andrew B. Singleton, Mike A. Nalls, Jimeng Sun, Faraz Faghri","doi":"10.1016/j.patter.2024.100945","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100945","url":null,"abstract":"While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson’s disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"7 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140047844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An evaluation of synthetic data augmentation for mitigating covariate bias in health data 评估合成数据扩增以减少健康数据中的协变量偏差
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-29 DOI: 10.1016/j.patter.2024.100946
Lamin Juwara, Alaa El-Hussuna, Khaled El Emam
Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.
数据偏差是生物医学研究中的一个主要问题,尤其是在评估大规模观测数据集时。它导致标准回归模型中不精确的预测和不一致的估计。我们比较了常用的消除偏差方法(重采样、算法和事后方法)与合成数据扩增方法的性能,后者利用序列提升决策树来合成代表性不足的群体。这种方法被称为合成少数群体增强法(SMA)。通过模拟和分析逻辑回归工作负载上的真实健康数据集,在各种偏差情况(类型和严重程度)下对这些方法进行了评估。性能评估基于曲线下面积、校准(布赖尔评分)、参数估计精度、置信区间重叠和公平性。总体而言,在中低偏差(50% 或更低的缺失比例)情况下,SMA 得出的结果最接近地面实况。而在高偏差(80% 或以上的缺失比例)情况下,SMA 的优势并不明显,没有一种特定的方法始终优于其他方法。
{"title":"An evaluation of synthetic data augmentation for mitigating covariate bias in health data","authors":"Lamin Juwara, Alaa El-Hussuna, Khaled El Emam","doi":"10.1016/j.patter.2024.100946","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100946","url":null,"abstract":"Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias-mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data-augmentation method that utilizes sequential boosted decision trees to synthesize under-represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"7 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models TCGA-Reports:用于对基于文本的人工智能模型进行基准测试的机器可读病理报告资源
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-21 DOI: 10.1016/j.patter.2024.100933
Jenna Kefeli, Nicholas Tatonetti
{"title":"TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models","authors":"Jenna Kefeli, Nicholas Tatonetti","doi":"10.1016/j.patter.2024.100933","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100933","url":null,"abstract":"","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"60 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Propagating variational model uncertainty for bioacoustic call label smoothing 传播变异模型不确定性以平滑生物声学呼叫标签
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-12 DOI: 10.1016/j.patter.2024.100932
Georgios Rizos, Jenna Lawson, Simon Mitchell, Pranay Shah, Xin Wen, Cristina Banks-Leite, Robert Ewers, Björn W. Schuller
Along with propagating the input toward making a prediction, Bayesian neural networks also propagate uncertainty. This has the potential to guide the training process by rejecting predictions of low confidence, and recent variational Bayesian methods can do so without Monte Carlo sampling of weights. Here, we apply sample-free methods for wildlife call detection on recordings made via passive acoustic monitoring equipment in the animals’ natural habitats. We further propose uncertainty-aware label smoothing, where the smoothing probability is dependent on sample-free predictive uncertainty, in order to downweigh data samples that should contribute less to the loss value. We introduce a bioacoustic dataset recorded in Malaysian Borneo, containing overlapping calls from 30 species. On that dataset, our proposed method achieves an absolute percentage improvement of around 1.5 points on area under the receiver operating characteristic (AU-ROC), 13 points in F1, and 19.5 points in expected calibration error (ECE) compared to the point-estimate network baseline averaged across all target classes.
贝叶斯神经网络在传播预测输入的同时,也传播不确定性。这有可能通过拒绝置信度低的预测来指导训练过程,而最新的变异贝叶斯方法可以在不对权重进行蒙特卡罗采样的情况下做到这一点。在这里,我们将无抽样方法应用于野生动物叫声检测,该方法是通过动物自然栖息地的被动声学监测设备采集的录音。我们进一步提出了不确定性感知标签平滑法,其中平滑概率取决于无样本预测的不确定性,以降低对损失值贡献较小的数据样本的权重。我们介绍了在马来西亚婆罗洲记录的生物声学数据集,其中包含 30 个物种的重叠叫声。在该数据集上,与所有目标类别平均的点估计网络基线相比,我们提出的方法在接收器工作特征下面积 (AU-ROC) 方面实现了约 1.5 点的绝对百分比改进,在 F1 方面实现了 13 点的改进,在预期校准误差 (ECE) 方面实现了 19.5 点的改进。
{"title":"Propagating variational model uncertainty for bioacoustic call label smoothing","authors":"Georgios Rizos, Jenna Lawson, Simon Mitchell, Pranay Shah, Xin Wen, Cristina Banks-Leite, Robert Ewers, Björn W. Schuller","doi":"10.1016/j.patter.2024.100932","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100932","url":null,"abstract":"Along with propagating the input toward making a prediction, Bayesian neural networks also propagate uncertainty. This has the potential to guide the training process by rejecting predictions of low confidence, and recent variational Bayesian methods can do so without Monte Carlo sampling of weights. Here, we apply sample-free methods for wildlife call detection on recordings made via passive acoustic monitoring equipment in the animals’ natural habitats. We further propose uncertainty-aware label smoothing, where the smoothing probability is dependent on sample-free predictive uncertainty, in order to downweigh data samples that should contribute less to the loss value. We introduce a bioacoustic dataset recorded in Malaysian Borneo, containing overlapping calls from 30 species. On that dataset, our proposed method achieves an absolute percentage improvement of around 1.5 points on area under the receiver operating characteristic (AU-ROC), 13 points in F1, and 19.5 points in expected calibration error (ECE) compared to the point-estimate network baseline averaged across all target classes.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"32 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social construction of XAI: Do we need one definition to rule them all? XAI 的社会建构:我们是否需要一个定义来统领一切?
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-09 DOI: 10.1016/j.patter.2024.100926
Upol Ehsan, Mark O Riedl

In this opinion, Upol Ehsan and Mark Riedl argue why a singular monolithic definition of explainable AI (XAI) is neither feasible nor desirable at this stage of XAI's development.

在这篇观点中,Upol Ehsan 和 Mark Riedl 论证了为什么在 XAI 发展的现阶段,对可解释人工智能(XAI)进行单一的定义既不可行,也不可取。
{"title":"Social construction of XAI: Do we need one definition to rule them all?","authors":"Upol Ehsan, Mark O Riedl","doi":"10.1016/j.patter.2024.100926","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100926","url":null,"abstract":"<p><p>In this opinion, Upol Ehsan and Mark Riedl argue why a singular monolithic definition of explainable AI (XAI) is neither feasible nor desirable at this stage of XAI's development.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 2","pages":"100926"},"PeriodicalIF":6.5,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10873153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139900592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The DIRAC framework: Geometric structure underlies roles of diversity and accuracy in combining classifiers DIRAC 框架:几何结构是多样性和准确性在组合分类器中发挥作用的基础
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-05 DOI: 10.1016/j.patter.2024.100924
Matthew J. Sniatynski, John A. Shepherd, Lynne R. Wilkens, D. Frank Hsu, Bruce S. Kristal

Combining classification systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Similar to improving binary classification with fusion, fusing ranking systems most commonly increases Pearson or Spearman correlations with a target when the input classifiers are “sufficiently good” (generalized as “accuracy”) and “sufficiently different” (generalized as “diversity”), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Building on our previous empirical work establishing the DIRAC (DIversity of Ranks and ACcuracy) framework, which accurately predicts the outcome of fusing binary classifiers, we demonstrate that the DIRAC framework similarly explains the outcome of fusing ranking systems. Specifically, precise geometric representation of diversity and accuracy as angle-based distances within rank-based combinatorial structures (permutahedra) fully captures their synergistic roles in rank approximation, uncouples them from the specific metrics of a given problem, and represents them as generally as possible.

融合分类系统有可能提高预测准确度,但结果却无法预测。与利用融合改进二元分类类似,当输入分类器 "足够好"(概括为 "准确性")和 "足够不同"(概括为 "多样性")时,融合排序系统通常会提高与目标的皮尔逊或斯皮尔曼相关性,但这些因素对最终结果的单独和联合定量影响仍然未知。我们将解决这些问题。我们以前的实证工作建立了 DIRAC(等级和准确度的反差)框架,该框架能准确预测二元分类器的融合结果,在此基础上,我们证明 DIRAC 框架同样能解释排名系统的融合结果。具体来说,在基于等级的组合结构(permutahedra)中,将多样性和准确性精确地几何表示为基于角度的距离,充分体现了它们在等级近似中的协同作用,使它们与给定问题的特定指标脱钩,并尽可能普遍地表示它们。
{"title":"The DIRAC framework: Geometric structure underlies roles of diversity and accuracy in combining classifiers","authors":"Matthew J. Sniatynski, John A. Shepherd, Lynne R. Wilkens, D. Frank Hsu, Bruce S. Kristal","doi":"10.1016/j.patter.2024.100924","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100924","url":null,"abstract":"<p>Combining classification systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Similar to improving binary classification with fusion, fusing ranking systems most commonly increases Pearson or Spearman correlations with a target when the input classifiers are “sufficiently good” (generalized as “<span><em><strong>accuracy</strong></em></span>”) and “sufficiently different” (generalized as “<span><em><strong>diversity</strong></em></span>”), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Building on our previous empirical work establishing the DIRAC (<em>DI</em><span><em>versity</em></span> of Ranks and <em>AC</em><span><em>curacy</em></span>) framework, which accurately predicts the outcome of fusing binary classifiers, we demonstrate that the DIRAC framework similarly explains the outcome of fusing ranking systems. Specifically, precise geometric representation of <span><em><strong>diversity</strong></em></span> and <span><em><strong>accuracy</strong></em></span> as angle-based distances within rank-based combinatorial structures (permutahedra) fully captures their synergistic roles in rank approximation, uncouples them from the specific metrics of a given problem, and represents them as generally as possible.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"20 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139755942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data 从 ChIP-exo 数据中识别主题的加权两阶段序列比对框架
IF 6.5 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-02-02 DOI: 10.1016/j.patter.2024.100927
Yang Li, Yizhong Wang, Cankun Wang, Anjun Ma, Qin Ma, Bingqiang Liu

In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a “bookend” model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA’s performance against seven established tools. The results indicate TESA’s improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.

在这项研究中,我们介绍了 TESA(加权两阶段比对),这是一种创新的主题预测工具,它能完善 DNA 结合蛋白主题的识别,这对破译转录调控机制至关重要。与仅依赖序列数据的传统算法不同,TESA 通过为序列位置分配权重,整合了高分辨率染色质免疫沉淀(ChIP)信号,特别是来自 ChIP-exonuclease(ChIP-exo)的信号,从而提高了主题发现的能力。TESA 采用了一种细致入微的方法,将二项分布模型与图形模型相结合,并辅以 "书尾 "模型,从而提高了预测不同长度主题的准确性。我们利用来自 proChIPdb 的 90 个原核生物 ChIP-exo 数据集和 167 个智人数据集的广泛汇编进行了评估,将 TESA 的性能与七种成熟工具进行了比较。结果表明 TESA 提高了主题识别的精确度,这表明它在基因组研究领域做出了宝贵的贡献。
{"title":"A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data","authors":"Yang Li, Yizhong Wang, Cankun Wang, Anjun Ma, Qin Ma, Bingqiang Liu","doi":"10.1016/j.patter.2024.100927","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100927","url":null,"abstract":"<p>In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a “bookend” model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 <em>H</em>. <em>sapiens</em> datasets, compared TESA’s performance against seven established tools. The results indicate TESA’s improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"41 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139669725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Patterns
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1