Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng
{"title":"You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL","authors":"Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng","doi":"arxiv-2409.12172","DOIUrl":null,"url":null,"abstract":"While significant progress has been made on the text-to-SQL task, recent\nsolutions repeatedly encode the same database schema for every question,\nresulting in unnecessary high inference cost and often overlooking crucial\ndatabase knowledge. To address these issues, we propose You Only Read Once\n(YORO), a novel paradigm that directly internalizes database knowledge into the\nparametric knowledge of a text-to-SQL model during training and eliminates the\nneed for schema encoding during inference. YORO significantly reduces the input\ntoken length by 66%-98%. Despite its shorter inputs, our empirical results\ndemonstrate YORO's competitive performances with traditional systems on three\nbenchmarks as well as its significant outperformance on large databases.\nFurthermore, YORO excels in handling questions with challenging value\nretrievals such as abbreviation.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
While significant progress has been made on the text-to-SQL task, recent
solutions repeatedly encode the same database schema for every question,
resulting in unnecessary high inference cost and often overlooking crucial
database knowledge. To address these issues, we propose You Only Read Once
(YORO), a novel paradigm that directly internalizes database knowledge into the
parametric knowledge of a text-to-SQL model during training and eliminates the
need for schema encoding during inference. YORO significantly reduces the input
token length by 66%-98%. Despite its shorter inputs, our empirical results
demonstrate YORO's competitive performances with traditional systems on three
benchmarks as well as its significant outperformance on large databases.
Furthermore, YORO excels in handling questions with challenging value
retrievals such as abbreviation.