Skip to content

MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction

License

Notifications You must be signed in to change notification settings

VectorInstitute/masksql

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

180 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MaskSQL


code checks unit tests docs codecov GitHub License

MaskSQL is a privacy-preserving framework for LLM-based text-to-SQL that uses schema masking and progressive unmasking to protect sensitive database information while maintaining high query accuracy.

Table of Contents

Installation and Setup Instructions

Docker Installation

Setup the env variables:

cp .env.example .env

Fill in the required variables

Run the MaskSQL using the published Docker image:

docker compose run --rm masksql python main.py

Build the Docker image

Build the Docker image locally:

docker compose -f docker-compose.local.yaml build

Interactive shell

You can run the MaskSQL container and then have a shell access to the container:

docker compose up -d
# Or
docker compose -f docker-compose.local.yaml up -d

After the container started successfully you can have a shell access:

docker compose exec -it masksql bash
# Or
docker compose -f docker-compose.local.yaml exec -it masksql bash

Native Installation

Requirements

  • Python 3.11
  • uv package manager

Setup Environment

Install dependencies and activate the virtual environment:

uv sync --dev
source .venv/bin/activate

Download Dataset

Download and extract the dataset:

wget -O data.zip "https://www.dropbox.com/scl/fi/vtraf79vfi1x105veaflk/data.zip?rlkey=7yq6d46aer6h45pdihrc9rht1&st=zdac3rqx&dl=0"
unzip data.zip

Expected directory structure:

data/
├── databases/
├── 1_input.json
└── ...

Configure Environment

Create a .env file from the template:

cp .env.example .env

Required:

Optional:

  • LIMIT: Number of dataset entries to process (e.g., LIMIT=10)
  • START: Starting index in the dataset (default: 0)

Running MaskSQL

Configuration

To configure the MaskSQL, uses the configs/conf.yaml file by default. You can pass in arbitrary config files using the --config option of the CLI interface.

1. Run RESDSQL (Schema Filtering)

MaskSQL requires RESDSQL for initial schema filtering. Follow the RESDSQL setup instructions to generate the required files.

2. Run the Pipeline

Execute the MaskSQL pipeline:

python3 main.py

or to clean previous outputs and rerun:

python3 main.py --clean

Documentation

Citation

If you use MaskSQL in your research, please cite our paper:

@article{abedini2025masksql,
  title={MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction},
  author={Abedini, Sepideh and Mohapatra, Shubhankar and Emerson, DB and Shafieinejad, Masoumeh and Cresswell, Jesse C and He, Xi},
  journal={arXiv preprint arXiv:2509.23459},
  year={2025}
}

Paper: https://arxiv.org/abs/2509.23459

About

MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.2%
  • Shell 4.8%