Luke P. J. Gilligan, Matteo Cobelli, Hasan M. Sayeed, Taylor D. Sparks, Stefano Sanvito
{"title":"Sampling Latent Material-Property Information From LLM-Derived Embedding Representations","authors":"Luke P. J. Gilligan, Matteo Cobelli, Hasan M. Sayeed, Taylor D. Sparks, Stefano Sanvito","doi":"arxiv-2409.11971","DOIUrl":null,"url":null,"abstract":"Vector embeddings derived from large language models (LLMs) show promise in\ncapturing latent information from the literature. Interestingly, these can be\nintegrated into material embeddings, potentially useful for data-driven\npredictions of materials properties. We investigate the extent to which\nLLM-derived vectors capture the desired information and their potential to\nprovide insights into material properties without additional training. Our\nfindings indicate that, although LLMs can be used to generate representations\nreflecting certain property information, extracting the embeddings requires\nidentifying the optimal contextual clues and appropriate comparators. Despite\nthis restriction, it appears that LLMs still have the potential to be useful in\ngenerating meaningful materials-science representations.","PeriodicalId":501234,"journal":{"name":"arXiv - PHYS - Materials Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Materials Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Vector embeddings derived from large language models (LLMs) show promise in
capturing latent information from the literature. Interestingly, these can be
integrated into material embeddings, potentially useful for data-driven
predictions of materials properties. We investigate the extent to which
LLM-derived vectors capture the desired information and their potential to
provide insights into material properties without additional training. Our
findings indicate that, although LLMs can be used to generate representations
reflecting certain property information, extracting the embeddings requires
identifying the optimal contextual clues and appropriate comparators. Despite
this restriction, it appears that LLMs still have the potential to be useful in
generating meaningful materials-science representations.