This repository documents the journey from fundamental Computer Vision concepts to the development of a fully autonomous drone tracking system. It contains the core flight control software, advanced vision utilities, and educational modules used to build the necessary visual perception skills.
This directory contains the production-grade code for the autonomous system:
track_PID.py: The main autonomous flight script using PID and Kalman logic.img_translation_detection_*.py: Advanced algorithms for detecting precise pixel shifts between frames (Digital Image Stabilization logic).
Foundational scripts and exercises covering:
- Image Processing: Grayscale, Gaussian Blur, Edge Detection (Canny), and Morphological operations.
- Geometric Transformations: Homography and "Bird's Eye View" warping.
- Machine Learning: A CNN trained on the MNIST dataset for Optical Character Recognition (OCR).
The drone tracking system achieves autonomous behavior through three key control theories:
The drone steers based on Visual Error (
-
$e < 0$ : Target is left$\rightarrow$ Strafe Left. -
$e > 0$ : Target is right$\rightarrow$ Strafe Right.
To convert error into smooth motor commands, a Proportional-Integral-Derivative (PID) controller is used:
-
$P$ : Reacts to current error (Speed). -
$I$ : Corrects steady-state lag (Accuracy). -
$D$ : Predicts future error to dampen oscillations (Stability).
A Linear Kalman Filter estimates the true state of the target
Recent updates include a hybrid engine to calculate the precise pixel shift (
For pure translations, we calculate shift in the frequency domain.
-
Theory: A shift in space becomes a phase shift in frequency:
$\mathcal{F}{f(x-x_0)} = F(u)e^{-i2\pi u x_0}$ . -
Spectral Whitening: We normalize the cross-power spectrum to isolate phase information:
$$R = \frac{F_1 \cdot F_2^{*}}{|F_1 \cdot F_2^{*}|}$$ - Robustness: To handle lighting changes, we pre-process images using Sobel Edge Detection and Otsu Thresholding before the FFT. This tracks "structure" rather than "brightness."
The FFT peak only gives integer accuracy. To get precise floating-point shifts (e.g.,
Phase correlation fails if the camera rotates or zooms. The system solves this with a two-step "Rectify & Correlate" pipeline:
- Feature Matching: Detects keypoints (ORB) and matches them between frames.
-
RANSAC Homography: Estimates the transformation matrix
$H$ . -
Decision Logic:
- If
$H$ is simple (translation only)$\rightarrow$ Use direct Phase Correlation. - If
$H$ is complex (rotation/scale)$\rightarrow$ Warp (Rectify) the live image using$H^{-1}$ , then apply Phase Correlation for the final alignment.
- If
pip install airsim opencv-contrib-python filterpy numpy matplotlib pandas tensorflow