Rioc — AI Guard Backend

Automated AI security guard service. Detects people via camera (local YOLO or external webhook), analyzes frames with a vision LLM (MiniCPM-o / MiniCPM-V), and conducts two-way voice conversations with detected intruders through an IP speaker.

Built with FastAPI (Python). Runs as a single process with background tasks for vision analysis, audio transcription, and conversation management.

Architecture

                          ┌─────────────────────────┐
                          │   vLLM Server (GPU)      │
                          │   MiniCPM-o / MiniCPM-V  │
                          │   OpenAI-compatible API   │
                          └────────▲──────────────────┘
                                   │ HTTPS
┌──────────┐  webhook/   ┌────────┴──────────────────┐    WebSocket/HTTP     ┌────────────┐
│ CVR /    ├────────────►│   Rioc (this service)      ├────────────────────►│ IP Speaker  │
│ Camera   │  RTSP       │   FastAPI on :8000         │                     │ (Fanvil)    │
└──────────┘             └────────┬──────────────────┘                     └────────────┘
                                  │ HTTPS
                          ┌───────▼───────────────────┐
                          │   OpenAI API               │
                          │   STT (Whisper / gpt-4o)   │
                          │   TTS (tts-1)              │
                          └────────────────────────────┘

Prerequisites

Python 3.12+
ffmpeg — audio format conversion for speaker TTS (brew install ffmpeg / apt install ffmpeg)
Microphone — for two-way conversation (optional; can use IP speaker's built-in mic via USB)

External Services

Service	Required?	Purpose	Notes
vLLM server (self-hosted)	Yes	Vision LLM inference (MiniCPM-o-4_5 or MiniCPM-V-2_6)	OpenAI-compatible API. See DEPLOY_AWS.md for GPU setup
OpenAI API	Yes (for audio)	STT via Whisper/gpt-4o-transcribe, TTS via tts-1	Requires `OPENAI_STT_API_KEY`
SQLite	Auto	Conversation history (`ai_guard.db`, local file)	No setup needed — created automatically
Ollama	Optional	Local vision analysis (dev/demo fallback)	Only when `ENABLE_LOCAL_AUDIT=1`
VideoDB	Optional	Real-time video/audio indexing	Only when `ENABLE_VIDEODB=1`
IP Speaker (Fanvil CS20 etc.)	Optional	Loudspeaker output + mic input	WebSocket G.711 or HTTP play

No Redis, no external SQL database, no message queue.

Install

cd rioc/
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration

Copy the example env file and fill in values:

cp .env.example .env

Required Environment Variables

Variable	Description	Example
`OPENAI_STT_API_KEY`	OpenAI API key for STT and TTS	`sk-proj-...`
`CLOUD_AI_URL`	vLLM server base URL (with `/v1`)	`http://172.16.128.41:8100/v1`
`CLOUD_AI_API_KEY`	API key for vLLM server	`token-minicpm-v45`

Optional Environment Variables

Variable	Default	Description
`ENABLE_CLOUD_AI`	`""` (off)	Set `1` to enable cloud vision analysis
`CLOUD_AI_MODEL`	`openbmb/MiniCPM-o-4_5-awq`	Model served by vLLM
`FRAME_SOURCE`	`webhook`	Person detection source: `webhook`, `local_yolo`, or `live_ffmpeg`
`CAMERA_RTSP_URL`	`""`	RTSP stream URL (if using local camera)
`ENABLE_LOCAL_AUDIT`	`""` (off)	Set `1` for local Ollama vision audit
`ENABLE_AUDIO_STT`	`""` (off)	Set `1` for microphone transcription
`ENABLE_SPEAKER_TTS`	`""` (off)	Set `1` to output TTS through IP speaker
`ENABLE_VIDEODB`	`""` (off)	Set `1` for VideoDB integration
`SPEAKER_URL`	`""`	IP speaker base URL (e.g. `https://192.168.10.183`)
`SPEAKER_WS_URL`	derived	WebSocket URL for speaker two-way audio
`SPEAKER_USER` / `SPEAKER_PASS`	`""`	Speaker auth credentials
`MINICPMO_URL`	`http://172.16.128.41:8101/`	MiniCPM-o conversation endpoint (port 8101)
`MINICPMO_API_KEY`	`token-minicpm-o45`	API key for MiniCPM-o conversation server
`AUDIO_INPUT_DEVICE`	system default	Mic device index or name substring (e.g. `Fanvil`)
`AUDIT_INTERVAL_SEC`	`2.0`	Seconds between vision audit cycles
`AUDIT_AI_FRAME_SIZE`	`320`	Frame resize before sending to cloud AI
`ENABLE_YOLO`	`1`	YOLO person pre-filter (reduces LLM calls)
`YOLO_CONFIDENCE`	`0.45`	YOLO detection confidence threshold
`ALERT_COOLDOWN_SEC`	`30.0`	Min seconds between detection alerts
`TTS_COOLDOWN_SEC`	`20.0`	Min seconds between TTS announcements
`CONVERSATION_MAX_TURNS`	`6`	Max conversation turns before ending
`CONVERSATION_COOLDOWN_SEC`	`20.0`	Cooldown between conversations
`TTS_PUBLIC_URL`	`""`	Public URL for speaker to fetch TTS audio (for play-from-URL mode)
`ENABLE_LOCAL_PLAYBACK`	`""` (off)	Play TTS through Mac speakers (testing)

Run

source .venv/bin/activate
uvicorn main:app --host 0.0.0.0 --port 8000

Production (webhook mode — default)

The default FRAME_SOURCE=webhook mode expects person-detection events from an external CVR system via POST /api/person-detected. No local camera or YOLO loop is needed.

ENABLE_CLOUD_AI=1 \
ENABLE_AUDIO_STT=1 \
ENABLE_SPEAKER_TTS=1 \
uvicorn main:app --host 0.0.0.0 --port 8000

Development (local YOLO)

Uses a local camera and YOLO for person detection:

FRAME_SOURCE=local_yolo \
CAMERA_RTSP_URL=rtsp://... \
ENABLE_CLOUD_AI=1 \
uvicorn main:app --host 0.0.0.0 --port 8000

API Endpoints

Core

Method	Path	Description
`GET`	`/`	HTML status page
`GET`	`/video`	MJPEG video stream
`GET`	`/transcript`	Latest audio transcript
`GET`	`/analysis`	Latest cloud AI vision analysis

Detections & Events

Method	Path	Description
`GET`	`/detections`	Recent detection history (JSON array)
`GET`	`/detections/stream`	SSE stream of real-time detection events
`GET`	`/events`	In-memory event log (last 500 events)

Conversations (AI Guard two-way voice)

Method	Path	Description
`POST`	`/conversation/start`	Start a new conversation manually
`POST`	`/conversation/respond`	Inject a text response into active conversation
`GET`	`/conversation/status`	Current conversation state
`GET`	`/conversation/stream`	SSE stream of conversation turns
`POST`	`/conversation/configure`	Update system prompt / max turns at runtime
`GET`	`/conversations`	List past conversations (from SQLite)
`GET`	`/conversations/{id}`	Get a conversation with all turns

Webhooks & Integration

Method	Path	Description
`POST`	`/api/person-detected`	Webhook: receive person-detection events (with JPEG frame)
`POST`	`/api/frame-update`	Push a camera frame update
`POST`	`/configure`	Reconfigure camera RTSP URL at runtime

Speaker & TTS

Method	Path	Description
`POST`	`/tts/test`	Test TTS through speaker
`GET`	`/tts/latest.mp3`	Latest generated TTS audio file
`GET`	`/speaker-test`	Test speaker connectivity
`GET`	`/speaker-diagnostic`	Speaker connection diagnostics
`GET`	`/speaker-test-bell`	Play test bell sound on speaker

Data Storage

ai_guard.db — SQLite database (auto-created in the project root). Stores conversation history and turns. No migrations needed; tables are created on first run.
audio_logs/ — Directory for saved audio recordings from conversations (auto-created).

Files

File	Purpose
`main.py`	FastAPI app, endpoints, YOLO detection, TTS/speaker, background loops
`conversation_manager.py`	Two-way voice conversation state machine (WARNING → ESCALATING → FINAL)
`minicpmo_client.py`	MiniCPM-o API client (vision + audio, with fallback)
`mic_listener.py`	VAD-gated microphone capture (WebRTC VAD)
`db.py`	SQLite schema and CRUD for conversations/turns
`vision_test.py`	Standalone Ollama vision test script

Cloud Brain (AWS deployment)

For deploying the vision LLM (vLLM) on AWS or RunPod GPU instances, see DEPLOY_AWS.md.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
scripts		scripts
.env.cloud_brain.example		.env.cloud_brain.example
.env.example		.env.example
.gitignore		.gitignore
DEPLOY_AWS.md		DEPLOY_AWS.md
README.md		README.md
cloud_brain.py		cloud_brain.py
conversation_manager.py		conversation_manager.py
db.py		db.py
main.py		main.py
mediamtx.yml		mediamtx.yml
mic_listener.py		mic_listener.py
minicpmo_client.py		minicpmo_client.py
requirements-cloud_brain.txt		requirements-cloud_brain.txt
requirements.txt		requirements.txt
videodb_integration.py		videodb_integration.py
vision_test.py		vision_test.py
webcam_stream.py		webcam_stream.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rioc — AI Guard Backend

Architecture

Prerequisites

External Services

Install

Configuration

Required Environment Variables

Optional Environment Variables

Run

Production (webhook mode — default)

Development (local YOLO)

API Endpoints

Core

Detections & Events

Conversations (AI Guard two-way voice)

Webhooks & Integration

Speaker & TTS

Data Storage

Files

Cloud Brain (AWS deployment)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rioc — AI Guard Backend

Architecture

Prerequisites

External Services

Install

Configuration

Required Environment Variables

Optional Environment Variables

Run

Production (webhook mode — default)

Development (local YOLO)

API Endpoints

Core

Detections & Events

Conversations (AI Guard two-way voice)

Webhooks & Integration

Speaker & TTS

Data Storage

Files

Cloud Brain (AWS deployment)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages