{"title":"Harmonic Reasoning in Large Language Models","authors":"Anna Kruspe","doi":"arxiv-2409.05521","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are becoming very popular and are used for many\ndifferent purposes, including creative tasks in the arts. However, these models\nsometimes have trouble with specific reasoning tasks, especially those that\ninvolve logical thinking and counting. This paper looks at how well LLMs\nunderstand and reason when dealing with musical tasks like figuring out notes\nfrom intervals and identifying chords and scales. We tested GPT-3.5 and GPT-4o\nto see how they handle these tasks. Our results show that while LLMs do well\nwith note intervals, they struggle with more complicated tasks like recognizing\nchords and scales. This points out clear limits in current LLM abilities and\nshows where we need to make them better, which could help improve how they\nthink and work in both artistic and other complex areas. We also provide an\nautomatically generated benchmark data set for the described tasks.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) are becoming very popular and are used for many
different purposes, including creative tasks in the arts. However, these models
sometimes have trouble with specific reasoning tasks, especially those that
involve logical thinking and counting. This paper looks at how well LLMs
understand and reason when dealing with musical tasks like figuring out notes
from intervals and identifying chords and scales. We tested GPT-3.5 and GPT-4o
to see how they handle these tasks. Our results show that while LLMs do well
with note intervals, they struggle with more complicated tasks like recognizing
chords and scales. This points out clear limits in current LLM abilities and
shows where we need to make them better, which could help improve how they
think and work in both artistic and other complex areas. We also provide an
automatically generated benchmark data set for the described tasks.