A unified framework for benchmarking PCA vs SparsePCA pipelines under adversarial attacks. This repository evaluates the robustness of dimensionality reduction techniques by building end-to-end differentiable pipelines and testing them against various adversarial attacks.
- Overview
- Requirements
- Installation
- Quick Start
- Command Line Interface
- Advanced Usage
- Output Files
- Troubleshooting
The framework builds end-to-end differentiable pipelines consisting of:
- Fixed linear projection (PCA or SparsePCA) as the first layer
- Fixed StandardScaler for normalization
- Small trainable MLP classifier
Adversarial attacks are implemented using IBM's Adversarial Robustness Toolbox (ART), enabling gradients to flow through the fixed preprocessing layers. The system supports multiple datasets and attack types with comprehensive evaluation metrics.
- Python 3.10+ recommended
- pip 22+ recommended
- Optional: CUDA-capable GPU with matching PyTorch build
All Python dependencies are listed in requirements.txt. Datasets are downloaded automatically:
- MNIST via
sklearn.fetch_openml - CIFAR-10 via
torchvision.datasets.CIFAR10into./cifar_data
# Clone and navigate to the repository
git clone <repository-url>
cd SPCARobustness
# Create and activate virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txtFor GPU acceleration with CUDA 12.1:
pip uninstall -y torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121# Clone and navigate to the repository
git clone <repository-url>
cd SPCARobustness
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txtFor GPU acceleration with CUDA 12.1:
pip uninstall -y torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121Run MNIST experiments with all available attacks:
python main.py --dataset mnist --attacks ALLRun with specific attacks and custom parameters:
python main.py --dataset mnist --attacks FGSM PGD MIM --n-components 100 150 200 --eps-start 0.01 --eps-end 0.2 --save-samplesRun CIFAR-10 binary classification (airplane vs frog) with all attacks:
python main.py --dataset cifar-binary --attacks ALL --eps-end 0.1Run with specific configuration for faster execution:
python main.py --dataset cifar-binary --attacks FGSM PGD MIM SQUARE --n-components 100 150 --eps-start 0.01 --eps-end 0.1 --n-samples 5000 --save-modelsmnist: MNIST 10-class digit classification (28x28 grayscale)cifar-binary: CIFAR-10 binary classification (airplane vs frog, 32x32 RGB)
FGSM: Fast Gradient Sign MethodPGD: Projected Gradient DescentMIM: Momentum Iterative MethodSQUARE: Square Attack (black-box)
Use --attacks ALL to run all attacks, or specify individual attacks: --attacks FGSM PGD MIM
| Parameter | Description | Default |
|---|---|---|
--dataset |
Dataset choice (mnist, cifar-binary) |
mnist |
--attacks |
List of attacks or ALL |
Uses --attack value |
--attack |
Single attack if --attacks not specified |
FGSM |
--norm |
Attack norm (2 for L2, inf for L∞) | 2 |
--n-components |
PCA/SPCA component counts | [100, 150, 200] |
--eps-start |
Starting epsilon value | 0.01 |
--eps-end |
Ending epsilon value | 0.2 |
--eps-step |
Epsilon step size | 0.01 |
--n-samples |
Limit dataset size for speed | None (full dataset) |
--epochs |
Training epochs for MLP | 20 |
--save-samples |
Save adversarial sample visualizations | False |
--save-models |
Cache trained models to disk | False |
Enable model caching for faster repeated experiments:
python main.py --dataset mnist --attacks ALL --save-models --models-dir cached_modelsFor faster execution during development:
python main.py --dataset mnist --attacks FGSM PGD --n-components 64 128 --n-samples 2000 --epochs 10 --attack-n-test 1000Fine-tune attack parameters:
python main.py --dataset cifar-binary --attacks SQUARE --square-max-iter 500 --square-restarts 3 --attack-batch-size 128The system generates several types of output files:
Robustness Plots:
- Format:
{dataset}_{attack}_norm_{norm}_eps_{start}_to_{end}_ncomp_{components}_nsamples_{n}.png - Example:
mnist_fgsm_norm_l2_eps_0.01_to_0.2_ncomp_100_to_200_nsamples_60000.png
Adversarial Sample Visualizations (when --save-samples is enabled):
- Directory format:
adv_samples_{attack}_norm_{norm}_eps_{start}_to_{end}_ncomp_{components}_nsamples_{n}/ - Contains side-by-side comparisons of clean vs adversarial examples
Cached Models (when --save-models is enabled):
- Directory:
models/(or custom via--models-dir) - Contains serialized PCA/SPCA transformations and trained classifiers
PyTorch Installation Issues: Ensure you install a build matching your platform and CUDA version. See PyTorch installation guide.
Memory Issues: Reduce batch sizes or limit test samples:
python main.py --attack-batch-size 64 --attack-n-test 2000Long Runtimes: Use fewer components and smaller datasets for initial testing:
python main.py --n-components 64 --n-samples 5000 --epochs 5CUDA Out of Memory: Disable GPU or reduce batch sizes:
# Force CPU usage
CUDA_VISIBLE_DEVICES="" python main.py --dataset mnist