Skip to content

SmartData-Polito/ShortcutCatcher

Repository files navigation

ShortcutCatcher

Machine learning has significantly advanced encrypted traffic classification, but deployed models often fail due to shortcut features - spurious correlations learned during training that do not generalize to real-world environments.

ShortcutCatcher is a model-agnostic framework designed to automatically detect and mitigate these shortcuts using explainable AI techniques.

📖 How to install

conda create -n ag python=3.10
pip install -U uv
python -m uv pip install autogluon==1.4
uv pip install autogluon.tabular[tabicl]
uv pip install matplotlib

📖 How to run

Running example:

python -u main.py \
  --root_path dataset \
  --experiment per-flow \
  --dataset app53-time-s2 \
  --description soft_window \
  --model_name RandomForestGini \
  --noise_type removal \
  --min_bound 0.0 \
  --importance_type default \
  --window 5 \
  --rounds 100

For the paper tables and figures, see how_to_reproduce.md for the full batch commands.

🔍 Key Idea

ShortcutCatcher contrasts model behavior across two datasets:

  • A training dataset used for model learning
  • A verification dataset representing a different scenario but sharing the same feature schema

By analyzing discrepancies in feature importance across these datasets, the framework identifies features that act as shortcuts and are unlikely to hold in deployment.

⚙️ How It Works

ShortcutCatcher operates in a closed loop:

  1. Train a model on the training dataset
  2. Generate feature explanations (e.g., via XAI methods)
  3. Compare feature relevance across training and verification scenarios
  4. Detect unstable or spurious features
  5. Iteratively remove or mitigate these features
  6. Retrain and reevaluate the model

🚀 Contributions

  • ✅ Automated detection of shortcut features
  • ✅ Model-agnostic design (compatible with various ML architectures)
  • ✅ Improved cross-scenario generalization (up to 3× over standard training)
  • ✅ Identification of hidden dataset artifacts affecting performance
  • ✅ Realistic evaluation of encrypted traffic classification tasks

About

Detecting and mitigating shortcut features in encrypted traffic classification using explainable AI and cross-scenario evaluation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages