AssistedVision is an end-to-end computer vision system that processes video input in real time, detects objects, estimates their distance, evaluates risk, identifies free walking gaps, predicts safe turning directions, and generates spoken audio feedback.
It integrates several CV components—YOLOv8 detection, MiDaS depth estimation, Kalman tracking, risk scoring, passable-gap detection, turn prediction, and optional mobile audio streaming—into one complete assistive perception pipeline.
- Runs via Ultralytics YOLOv8
- Performs detection every frame
- Filters predictions by confidence threshold
- Produces bounding boxes + class labels
- Uses MiDaS (DPT-based) model
- Produces full-frame depth map
- Extracts median depth per object bounding box
- Provides relative distance estimation
- Kalman filter based tracker
- IoU association + ID persistence
- Smooth motion trajectories
- Handles temporary occlusions
Risk is computed from:
- object distance (MiDaS depth)
- bbox area (proximity indication)
- object vertical position relative to horizon
- class-based rules (e.g., people, vehicles)
The engine generates:
- high-risk audio warnings
- distance-aware messages
- object-position descriptions (“ahead”, “left”, “right”)
From path_finder.py:
- analyzes object bounding boxes
- identifies free vertical gaps
- chooses widest traversable path
- used to determine safe forward direction
- if no safe forward gap is found → turn prediction
- chooses left or right based on depth + empty space
- audio output:
- “Turn slightly left”
- “Turn right”
- “Clear ahead”
Objects are mapped to relative angular zones:
- “12 o’clock”
- “2 o’clock”
- “9 o’clock”
for intuitive spoken feedback.
Two modes:
- Uses Windows-TTS (PowerShell SAPI)
- Plays prioritized spoken messages
- Browser-based audio via WebSocket
- Phone receives all spoken guidance
- Phone can send gyro data back for future use
To enable mobile mode:
python src/main.py --camera 0 --mobileOpen mobile.html on your phone (must be on the same WiFi).
┌──────────────────────────────┐
│ Camera / Video │
└───────────────┬──────────────┘
▼
┌──────────────────────────────┐
│ YOLOv8 Detection │
└───────────────┬──────────────┘
▼
┌──────────────────────────────┐
│ Kalman Tracking │
└───────────────┬──────────────┘
▼
┌──────────────────────────────┐
│ MiDaS Depth Model │
└───────────────┬──────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Risk Engine + Gap Detection + Turning │
└───────────────┬─────────────────────────────────────┘
▼
┌──────────────────────────────┐
│ Audio Output (TTS) │
└──────────────────────────────┘
To set up the environment and run the project, follow these steps:
git clone https://github.com/<your-username>/AssistedVision.git
cd AssistedVisionconda create -n assistedvision python=3.10 -y
conda activate assistedvisionpip install -r requirements.txtAssistedVision/
├── data/
│ ├── raw/ # Raw video samples (optional)
│ ├── processed/ # Saved processed outputs
│ ├── samples/ # Demo input videos
│ └── myvideo.mp4 # Example user-added video
│
├── src/
│ ├── main.py # Main pipeline: detection + depth + risk + audio + UI
│ ├── detection.py # YOLOv8 detection wrapper
│ ├── depth.py # MiDaS depth estimation
│ ├── tracker.py # Kalman filter + object tracking
│ ├── risk.py # Rule-based risk engine
│ ├── prob_risk.py # Probabilistic risk scoring (experimental)
│ ├── path_finder.py # Gap detection + navigation decision module
│ ├── instaYOLO_seg.py # YOLO segmentation (optional module)
│ ├── tts.py # Text-to-speech (Windows SAPI)
│ ├── viz.py # Visualization utilities and drawing
│ ├── utils.py # Helper functions
│ ├── mobile_server.py # WebSocket server for phone audio + gyro
│ └── __init__.py # Package initializer
│
├── mobile.html # Mobile companion interface (audio + gyroscope)
│
├── requirements.txt # Python dependencies
├── environment.yml # Conda environment file (optional)
│
├── run.sh # Linux run script (optional)
├── start_mobile.bat # Windows helper script (mobile mode)
├── test_webcam.ps1 # Webcam test script for Windows
│
├── README.md # Project documentation
├── LICENSE # MIT license
└── COMPREHENSIVE_PROJECT_SUMMARY.md # Full technical write-up
AssistedVision can be run in two main modes:
- Real-time mode using your webcam
- Offline mode using a video file
- Android mobile mode (phone camera + phone audio + gyro)
- Connect your webcam.
- Open a terminal inside the project folder.
- Run the command below:
python src/main.py --camera 0 --yolo yolov8n.pt --output webcam_output.mp4Step 1 — Put your video inside the data/ folder:
data/myvideo.mp4Step 2 — Run this command:
python src/main.py --video data/myvideo.mp4 --yolo yolov8n.pt --output result.mp4Note: Full mobile mode (phone camera + audio + gyro) is currently tested and supported on Android.
In this mode, your Android phone acts as the camera and audio device, while your laptop runs all computer vision and navigation logic.
- On your Android phone, install “IP Webcam” from the Google Play Store.
- Open the app and scroll down to “Start server”.
- Tap Start server.
- At the bottom of the screen, you will see a URL like:
http://192.168.0.15:8080/video
This is your phone camera stream URL. You will use this in the --camera argument.
Make sure the phone and laptop are on the same WiFi network.
In a terminal on your laptop, inside the project folder, run:
python src/main.py \
--camera http://PHONE_IP:8080/video \
--yolo yolov8n.pt \
--mobileExample:
python src/main.py --camera http://192.168.0.15:8080/video --yolo yolov8n.pt --mobileThis will:
Use the Android phone camera as the video source
Run YOLOv8, depth estimation, tracking, risk analysis, and navigation on the laptop
Start the mobile communication server for audio + gyroscope
From the project root on the laptop, start a simple HTTP server:
python -m http.server 8000This makes mobile.html available over the network.
- On the same Android phone, open Chrome.
- In the address bar, go to:
http://YOUR_PC_IP:8000/mobile.html
For example:
http://192.168.0.20:8000/mobile.html
-
When the page loads, allow: 3.1 Motion / gyroscope access 3.2 Audio permissions if prompted
-
The status on the page should indicate that the phone is connected to the PC.
If the stream is slow, you can add:
--imgsz 320 --skip-depth 5 --process-every 2Example:
python src/main.py \
--camera http://192.168.0.15:8080/video \
--yolo yolov8n.pt \
--mobile \
--imgsz 320 \
--skip-depth 5 \
--process-every 2