Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush
{"title":"关于语言学习者长语境扩展和泛化的对照研究","authors":"Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush","doi":"arxiv-2409.12181","DOIUrl":null,"url":null,"abstract":"Broad textual understanding and in-context learning require language models\nthat utilize full document contexts. Due to the implementation challenges\nassociated with directly training long-context models, many methods have been\nproposed for extending models to handle long contexts. However, owing to\ndifferences in data and model classes, it has been challenging to compare these\napproaches, leading to uncertainty as to how to evaluate long-context\nperformance and whether it differs from standard evaluation. We implement a\ncontrolled protocol for extension methods with a standardized evaluation,\nutilizing consistent base models and extension data. Our study yields several\ninsights into long-context behavior. First, we reaffirm the critical role of\nperplexity as a general-purpose performance indicator even in longer-context\ntasks. Second, we find that current approximate attention methods\nsystematically underperform across long-context tasks. Finally, we confirm that\nexact fine-tuning based methods are generally effective within the range of\ntheir extension, whereas extrapolation remains challenging. All codebases,\nmodels, and checkpoints will be made available open-source, promoting\ntransparency and facilitating further research in this critical area of AI\ndevelopment.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Controlled Study on Long Context Extension and Generalization in LLMs\",\"authors\":\"Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush\",\"doi\":\"arxiv-2409.12181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Broad textual understanding and in-context learning require language models\\nthat utilize full document contexts. Due to the implementation challenges\\nassociated with directly training long-context models, many methods have been\\nproposed for extending models to handle long contexts. However, owing to\\ndifferences in data and model classes, it has been challenging to compare these\\napproaches, leading to uncertainty as to how to evaluate long-context\\nperformance and whether it differs from standard evaluation. We implement a\\ncontrolled protocol for extension methods with a standardized evaluation,\\nutilizing consistent base models and extension data. Our study yields several\\ninsights into long-context behavior. First, we reaffirm the critical role of\\nperplexity as a general-purpose performance indicator even in longer-context\\ntasks. Second, we find that current approximate attention methods\\nsystematically underperform across long-context tasks. Finally, we confirm that\\nexact fine-tuning based methods are generally effective within the range of\\ntheir extension, whereas extrapolation remains challenging. All codebases,\\nmodels, and checkpoints will be made available open-source, promoting\\ntransparency and facilitating further research in this critical area of AI\\ndevelopment.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Controlled Study on Long Context Extension and Generalization in LLMs
Broad textual understanding and in-context learning require language models
that utilize full document contexts. Due to the implementation challenges
associated with directly training long-context models, many methods have been
proposed for extending models to handle long contexts. However, owing to
differences in data and model classes, it has been challenging to compare these
approaches, leading to uncertainty as to how to evaluate long-context
performance and whether it differs from standard evaluation. We implement a
controlled protocol for extension methods with a standardized evaluation,
utilizing consistent base models and extension data. Our study yields several
insights into long-context behavior. First, we reaffirm the critical role of
perplexity as a general-purpose performance indicator even in longer-context
tasks. Second, we find that current approximate attention methods
systematically underperform across long-context tasks. Finally, we confirm that
exact fine-tuning based methods are generally effective within the range of
their extension, whereas extrapolation remains challenging. All codebases,
models, and checkpoints will be made available open-source, promoting
transparency and facilitating further research in this critical area of AI
development.