AssistedVision

Project Summary

AssistedVision is an end-to-end computer vision system that processes video input in real time, detects objects, estimates their distance, evaluates risk, identifies free walking gaps, predicts safe turning directions, and generates spoken audio feedback.

It integrates several CV components—YOLOv8 detection, MiDaS depth estimation, Kalman tracking, risk scoring, passable-gap detection, turn prediction, and optional mobile audio streaming—into one complete assistive perception pipeline.

Features

Object Detection (YOLOv8) -- `detection.py`

Runs via Ultralytics YOLOv8
Performs detection every frame
Filters predictions by confidence threshold
Produces bounding boxes + class labels

Depth Estimation (MiDaS) -- `depth.py`

Uses MiDaS (DPT-based) model
Produces full-frame depth map
Extracts median depth per object bounding box
Provides relative distance estimation

Multi-Object Tracking -- `tracker.py`

Kalman filter based tracker
IoU association + ID persistence
Smooth motion trajectories
Handles temporary occlusions

Risk Evaluation -- `risk.py` and `prob_risk.py`

Risk is computed from:

object distance (MiDaS depth)
bbox area (proximity indication)
object vertical position relative to horizon
class-based rules (e.g., people, vehicles)

The engine generates:

high-risk audio warnings
distance-aware messages
object-position descriptions (“ahead”, “left”, “right”)

Navigation & Path Guidance

Passable Gap Detection

From path_finder.py:

analyzes object bounding boxes
identifies free vertical gaps
chooses widest traversable path
used to determine safe forward direction

Turn Detection

if no safe forward gap is found → turn prediction
chooses left or right based on depth + empty space
audio output:
- “Turn slightly left”
- “Turn right”
- “Clear ahead”

Clock-Position Mapping

Objects are mapped to relative angular zones:

“12 o’clock”
“2 o’clock”
“9 o’clock”
for intuitive spoken feedback.

Audio Output (TTS) -- `tts.py` and `mobile_server.py`

Two modes:

1) Desktop Mode (default)

Uses Windows-TTS (PowerShell SAPI)
Plays prioritized spoken messages

2) Mobile Mode (optional for Android)

Browser-based audio via WebSocket
Phone receives all spoken guidance
Phone can send gyro data back for future use

Mobile Companion

To enable mobile mode:

python src/main.py --camera 0 --mobile

Open mobile.html on your phone (must be on the same WiFi).

System Architecture

           ┌──────────────────────────────┐
           │        Camera / Video        │
           └───────────────┬──────────────┘
                           ▼
           ┌──────────────────────────────┐
           │       YOLOv8 Detection       │
           └───────────────┬──────────────┘
                           ▼
           ┌──────────────────────────────┐
           │       Kalman Tracking        │
           └───────────────┬──────────────┘
                           ▼
           ┌──────────────────────────────┐
           │       MiDaS Depth Model      │
           └───────────────┬──────────────┘
                           ▼
  ┌─────────────────────────────────────────────────────┐
  │     Risk Engine + Gap Detection + Turning           │
  └───────────────┬─────────────────────────────────────┘
                  ▼
           ┌──────────────────────────────┐
           │        Audio Output (TTS)    │
           └──────────────────────────────┘

Installation

To set up the environment and run the project, follow these steps:

1. Clone the repository

git clone https://github.com/<your-username>/AssistedVision.git
cd AssistedVision

2. Create a Conda environment

conda create -n assistedvision python=3.10 -y
conda activate assistedvision

3. Install dependencies

pip install -r requirements.txt

Project Structure

AssistedVision/
├── data/
│   ├── raw/                     # Raw video samples (optional)
│   ├── processed/               # Saved processed outputs
│   ├── samples/                 # Demo input videos
│   └── myvideo.mp4              # Example user-added video
│
├── src/
│   ├── main.py                  # Main pipeline: detection + depth + risk + audio + UI
│   ├── detection.py             # YOLOv8 detection wrapper
│   ├── depth.py                 # MiDaS depth estimation
│   ├── tracker.py               # Kalman filter + object tracking
│   ├── risk.py                  # Rule-based risk engine
│   ├── prob_risk.py             # Probabilistic risk scoring (experimental)
│   ├── path_finder.py           # Gap detection + navigation decision module
│   ├── instaYOLO_seg.py         # YOLO segmentation (optional module)
│   ├── tts.py                   # Text-to-speech (Windows SAPI)
│   ├── viz.py                   # Visualization utilities and drawing
│   ├── utils.py                 # Helper functions
│   ├── mobile_server.py         # WebSocket server for phone audio + gyro
│   └── __init__.py              # Package initializer
│
├── mobile.html                  # Mobile companion interface (audio + gyroscope)
│
├── requirements.txt             # Python dependencies
├── environment.yml              # Conda environment file (optional)
│
├── run.sh                       # Linux run script (optional)
├── start_mobile.bat             # Windows helper script (mobile mode)
├── test_webcam.ps1              # Webcam test script for Windows
│
├── README.md                    # Project documentation
├── LICENSE                      # MIT license
└── COMPREHENSIVE_PROJECT_SUMMARY.md  # Full technical write-up

How to Run the System

AssistedVision can be run in two main modes:

Real-time mode using your webcam
Offline mode using a video file
Android mobile mode (phone camera + phone audio + gyro)

Option 1 — Run in Real-Time with Your Webcam

Steps:

Connect your webcam.
Open a terminal inside the project folder.
Run the command below:

python src/main.py --camera 0 --yolo yolov8n.pt --output webcam_output.mp4

Option 2 — Run the System on a Video File

Step 1 — Put your video inside the data/ folder:

data/myvideo.mp4

Step 2 — Run this command:

python src/main.py --video data/myvideo.mp4 --yolo yolov8n.pt --output result.mp4

Option 3 — Android Mobile Mode (Phone Camera + Phone Audio + Gyroscope)

Note: Full mobile mode (phone camera + audio + gyro) is currently tested and supported on Android.

In this mode, your Android phone acts as the camera and audio device, while your laptop runs all computer vision and navigation logic.

1. Set up the Android phone camera (IP Webcam)

On your Android phone, install “IP Webcam” from the Google Play Store.
Open the app and scroll down to “Start server”.
Tap Start server.
At the bottom of the screen, you will see a URL like:

http://192.168.0.15:8080/video

This is your phone camera stream URL. You will use this in the --camera argument.

Make sure the phone and laptop are on the same WiFi network.

2. Start AssistedVision on the laptop (using phone camera)

In a terminal on your laptop, inside the project folder, run:

python src/main.py \
  --camera http://PHONE_IP:8080/video \
  --yolo yolov8n.pt \
  --mobile

Example:

python src/main.py --camera http://192.168.0.15:8080/video --yolo yolov8n.pt --mobile

This will:

Use the Android phone camera as the video source

Run YOLOv8, depth estimation, tracking, risk analysis, and navigation on the laptop

Start the mobile communication server for audio + gyroscope

3. Serve the mobile companion page from the laptop

From the project root on the laptop, start a simple HTTP server:

python -m http.server 8000

This makes mobile.html available over the network.

4. Connect the Android phone to the mobile companion

On the same Android phone, open Chrome.
In the address bar, go to:

http://YOUR_PC_IP:8000/mobile.html

For example:

http://192.168.0.20:8000/mobile.html

When the page loads, allow: 3.1 Motion / gyroscope access 3.2 Audio permissions if prompted
The status on the page should indicate that the phone is connected to the PC.

Optional performance settings (recommended for mobile mode)

If the stream is slow, you can add:

--imgsz 320 --skip-depth 5 --process-every 2

Example:

python src/main.py \
  --camera http://192.168.0.15:8080/video \
  --yolo yolov8n.pt \
  --mobile \
  --imgsz 320 \
  --skip-depth 5 \
  --process-every 2

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
docs		docs
experiments		experiments
src		src
tests		tests
.gitignore		.gitignore
COMPREHENSIVE_PROJECT_SUMMARY.md		COMPREHENSIVE_PROJECT_SUMMARY.md
LICENSE		LICENSE
MOBILE_SETUP.md		MOBILE_SETUP.md
Project_report (1).pdf		Project_report (1).pdf
README.md		README.md
README.txt		README.txt
environment.yml		environment.yml
mobile.html		mobile.html
requirements.txt		requirements.txt
research_paper.md		research_paper.md
run.sh		run.sh
start_mobile.bat		start_mobile.bat
test_webcam.ps1		test_webcam.ps1

License

LNSHRIVAS/AssistedVision

Folders and files

Latest commit

History

Repository files navigation

AssistedVision

Project Summary

Features

Object Detection (YOLOv8) -- detection.py

Depth Estimation (MiDaS) -- depth.py

Multi-Object Tracking -- tracker.py

Risk Evaluation -- risk.py and prob_risk.py