Felipe Yáñez, Xiaoliang Luo, Omar Valerio Minero, Bradley C. Love
{"title":"Confidence-weighted integration of human and machine judgments for superior decision-making","authors":"Felipe Yáñez, Xiaoliang Luo, Omar Valerio Minero, Bradley C. Love","doi":"arxiv-2408.08083","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have emerged as powerful tools in various\ndomains. Recent studies have shown that LLMs can surpass humans in certain\ntasks, such as predicting the outcomes of neuroscience studies. What role does\nthis leave for humans in the overall decision process? One possibility is that\nhumans, despite performing worse than LLMs, can still add value when teamed\nwith them. A human and machine team can surpass each individual teammate when\nteam members' confidence is well-calibrated and team members diverge in which\ntasks they find difficult (i.e., calibration and diversity are needed). We\nsimplified and extended a Bayesian approach to combining judgments using a\nlogistic regression framework that integrates confidence-weighted judgments for\nany number of team members. Using this straightforward method, we demonstrated\nin a neuroscience forecasting task that, even when humans were inferior to\nLLMs, their combination with one or more LLMs consistently improved team\nperformance. Our hope is that this simple and effective strategy for\nintegrating the judgments of humans and machines will lead to productive\ncollaborations.","PeriodicalId":501517,"journal":{"name":"arXiv - QuanBio - Neurons and Cognition","volume":"46 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Neurons and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models (LLMs) have emerged as powerful tools in various
domains. Recent studies have shown that LLMs can surpass humans in certain
tasks, such as predicting the outcomes of neuroscience studies. What role does
this leave for humans in the overall decision process? One possibility is that
humans, despite performing worse than LLMs, can still add value when teamed
with them. A human and machine team can surpass each individual teammate when
team members' confidence is well-calibrated and team members diverge in which
tasks they find difficult (i.e., calibration and diversity are needed). We
simplified and extended a Bayesian approach to combining judgments using a
logistic regression framework that integrates confidence-weighted judgments for
any number of team members. Using this straightforward method, we demonstrated
in a neuroscience forecasting task that, even when humans were inferior to
LLMs, their combination with one or more LLMs consistently improved team
performance. Our hope is that this simple and effective strategy for
integrating the judgments of humans and machines will lead to productive
collaborations.