A Big Bench for Large-Scale Database Grounded Text-to-SQLs

coffin5257 · on May 9, 2023

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. The dataset includes three extra problems that especially resonates with real-world scenarios: 1) the need for parsers to reason over 'dirty' values; 2) the necessity to incorporate external knowledge into parsers; and 3) the requirment to consider efficiency while executing SQLs. BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.