Fine-Tuning a Large Language Model for Text-to-SQL: Insights and Key Learnings | by Meghna Havalgi

As AI continues to transform how we interact with data, text-to-SQL models are emerging as powerful tools to bridge the gap between natural language and structured query generation. Instead of manually writing SQL queries, imagine asking a model:

“What is the total revenue generated from organic products in California in the last six months?”

…and instantly receiving a well-formed SQL query!

This project was an exciting journey where I fine-tuned a large language model (LLM) to generate SQL queries from natural language. Throughout this process, I explored fine-tuning techniques, evaluation challenges, prompt engineering, and efficiency optimizations. In this blog, I will walk you through my approach, key concepts like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA), the hurdles I faced, and what I learned.

While general-purpose LLMs can generate SQL, they often struggle with:
🔹 Domain-Specific Queries—They lack industry-specific nuances.
🔹 Schema Awareness — Without knowledge of database structures, they may generate invalid queries.
🔹 Execution Accuracy — A query might be syntactically correct but logically incorrect.

Fine-tuning an LLM on structured datasets with diverse query types can significantly improve SQL generation accuracy, making the model more context-aware and adaptable to real-world use cases.

Model Selection: Why I Used Unsloth’s Gemma-2–2B & LLaMA 3.2?

Since fine-tuning large models can be resource-intensive, I opted for Unsloth’s Gemma-2–2B & LLaMA 3.2(4-bit quantized), which is optimized for efficient training. Key reasons:

LLaMA 3.2 — A powerful open-weight LLM with strong reasoning capabilities.
Lightweight yet powerful — Gemma-2–2B is manageable on consumer GPUs.
Supports LoRA/QLoRA — Enabling fine-tuning with minimal memory consumption.
Faster training & inference — Thanks to 4-bit quantization, making it feasible without expensive GPUs.

Dataset Overview

We are using Gretel’s Synthetic Text-to-SQL dataset. Gretel’s Synthetic Text-to-SQL dataset is the largest open-source dataset of its kind, containing over 100,000 high-quality synthetic Text-to-SQL samples12. It’s designed to enhance AI model training by providing diverse and realistic SQL queries.

Dataset Structure

sql_prompt — Natural language query (e.g., “Retrieve active defense contracts.”)
sql_context — Schema information (e.g., “Table: contracts, Columns: contract_id, start_date, status”)
sql — Expected SQL output
sql_explanation — Explanation of SQL logic

Load and Format the Dataset:

The Gretel synthetic text-to-SQL dataset is loaded using the load_dataset function.
The formatting_prompts_func is applied to the dataset to format the prompts.
The dataset is split into training and testing sets, with 80% of the data used for training and 20% for testing.
The train_test_split method is used to perform the split, ensuring reproducibility with a fixed seed.

Formatting Prompts for Fine-Tuning

The alpaca_prompt template is used to structure the input data, ensuring that each example includes the context, SQL prompt, generated SQL query, and explanation. It helps to format prompts for fine-tuning a language model using Gretel’s synthetic text-to-SQL dataset.

LoRA (Low-Rank Adaptation) — The Key to Efficient Fine-Tuning

Instead of updating all model weights (which is computationally expensive), LoRA modifies only a subset of parameters. This reduces memory usage while still achieving fine-tuning benefits.

How it works:
🔹 LoRA adds small trainable adapter layers while freezing the pre-trained model.
🔹 Only low-rank updates are applied, making it much more efficient.
🔹 Drastically reduces VRAM requirements compared to full fine-tuning.

QLoRA (Quantized LoRA) — Taking Efficiency Further

While LoRA is great, QLoRA takes it a step further by using 4-bit quantization. This means:
✅ Lower memory footprint — Model weights are compressed to 4-bit precision.
✅ Faster training & inference — Less computational overhead.
✅ Same accuracy as full fine-tuning — Thanks to precision-preserving optimizations.

Using QLoRA, I fine-tuned the model on consumer-grade GPUs, making it highly cost-effective and scalable.

Structuring the Input Prompt

To get accurate SQL queries, I designed a structured prompt:

This ensures:
✅ The model understands the query intent.
✅ The database schema helps generate schema-aware SQL.
✅ The model explains the SQL, improving interpretability.

Example input-output:

Input:
“Find the total revenue from orders placed in the last 30 days.”

Generated SQL:

SELECT SUM(order_total)

FROM orders

WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY);

Explanation:
“This query calculates the total revenue by summing order_total for all orders placed within the last 30 days using DATE_SUB.”

This structured format significantly improved output coherence.

✅ Before Fine-Tuning:
The model generated SQL with missing conditions and incorrect table references.

✅ After Fine-Tuning:

Queries were more accurate, with correct table joins.

The model better understood database context and produced SQL closer to human-written queries.

Unlike classification tasks, evaluating SQL correctness is non-trivial. Here’s how I tackled it:

1️⃣ Exact Match Accuracy

Compare the generated SQL with the ground-truth SQL.
Works well for simple queries but fails when different SQL formulations are logically equivalent.

2️⃣ Execution-Based Evaluation

Run both generated SQL and expected SQL on a test database.
Compare execution results instead of raw SQL text.
More robust since different SQL formulations can yield the same results.

To make the model interactive, I built a Gradio-based UI where users can:
🔹 Enter a natural language query
🔹 Choose between the baseline & fine-tuned model
🔹 Receive SQL + Explanation instantly

This makes the model easily accessible, allowing users to test SQL generation accuracy in real-time!

🚀 QLoRA + 4-bit quantization = highly efficient fine-tuning with minimal computational cost.
📌 Prompt design matters — well-structured prompts significantly improved SQL accuracy.
⚡ Evaluation is non-trivial — correct SQL syntax ≠ correct execution, so query validation is crucial.
💡 Fine-tuned models outperform general-purpose LLMs on domain-specific SQL generation.

This project deepened my understanding of efficient fine-tuning techniques and practical AI model evaluation — and I’d love to discuss this with others working on LLMs, text-to-SQL, or fine-tuning!

The ability to fine-tune LLMs for specific tasks is a powerful skill in modern AI development. Text-to-SQL conversion is just one use case — these techniques can be extended to financial data extraction, legal document analysis, and more.

I hope this blog helps anyone exploring efficient LLM fine-tuning, Text-to-SQL models, and QLoRA techniques. Let me know your thoughts, and feel free to reach out if you’re working on similar projects! 🚀

#LLM #FineTuning #TextToSQL #QLoRA #AI #MachineLearning #Unsloth #LoRA #NLP #DataScience

Source link

Fine-Tuning a Large Language Model for Text-to-SQL: Insights and Key Learnings | by Meghna Havalgi | Mar, 2025

When Will the Crypto Market Escape Its Stagnation? (Analyst Weighs In)

Why Ubisoft needs Assassin’s Creed Shadows to save them

Why Ubisoft needs Assassin’s Creed Shadows to save them

Leave a Reply Cancel reply

POPULAR POSTS

10 Ways To Get a Free DoorDash Gift Card

They Combed the Co-ops of Upper Manhattan With $700,000 to Spend

Saal.AI and Cisco Systems Inc Ink MoU to Explore AI and Big Data Innovations at GITEX Global 2024

Exxon foe Engine No. 1 to build fossil fuel plants with Chevron

They Wanted a House in Chicago for Their Growing Family. Would $650,000 Be Enough?

Categories

Connect With Us

Recent Posts

North Korean spies posing as remote workers have infiltrated hundreds of companies, says CrowdStrike

5 Content Marketing Ideas for September 2025