{"title":"ValueCompass: A Framework of Fundamental Values for Human-AI Alignment","authors":"Hua Shen, Tiffany Knearem, Reshmi Ghosh, Yu-Ju Yang, Tanushree Mitra, Yun Huang","doi":"arxiv-2409.09586","DOIUrl":null,"url":null,"abstract":"As AI systems become more advanced, ensuring their alignment with a diverse\nrange of individuals and societal values becomes increasingly critical. But how\ncan we capture fundamental human values and assess the degree to which AI\nsystems align with them? We introduce ValueCompass, a framework of fundamental\nvalues, grounded in psychological theory and a systematic review, to identify\nand evaluate human-AI alignment. We apply ValueCompass to measure the value\nalignment of humans and language models (LMs) across four real-world vignettes:\ncollaborative writing, education, public sectors, and healthcare. Our findings\nuncover risky misalignment between humans and LMs, such as LMs agreeing with\nvalues like \"Choose Own Goals\", which are largely disagreed by humans. We also\nobserve values vary across vignettes, underscoring the necessity for\ncontext-aware AI alignment strategies. This work provides insights into the\ndesign space of human-AI alignment, offering foundations for developing AI that\nresponsibly reflects societal values and ethics.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As AI systems become more advanced, ensuring their alignment with a diverse
range of individuals and societal values becomes increasingly critical. But how
can we capture fundamental human values and assess the degree to which AI
systems align with them? We introduce ValueCompass, a framework of fundamental
values, grounded in psychological theory and a systematic review, to identify
and evaluate human-AI alignment. We apply ValueCompass to measure the value
alignment of humans and language models (LMs) across four real-world vignettes:
collaborative writing, education, public sectors, and healthcare. Our findings
uncover risky misalignment between humans and LMs, such as LMs agreeing with
values like "Choose Own Goals", which are largely disagreed by humans. We also
observe values vary across vignettes, underscoring the necessity for
context-aware AI alignment strategies. This work provides insights into the
design space of human-AI alignment, offering foundations for developing AI that
responsibly reflects societal values and ethics.