Fix clip_timestamps format, Windows UTF-8 console, and non-dict LLM response by Ahmed-Ezzat20 · Pull Request #4 · bakrianoo/mazinger

Ahmed-Ezzat20 · 2026-03-30T18:09:19Z

Summary

Three bug fixes discovered during end-to-end testing of the pipeline:

1. `clip_timestamps` format bug (faster-whisper backend)

_transcribe_faster_whisper() passes speech_clips_sec as a list of dicts ([{"start": 0.5, "end": 3.2}, ...]) to clip_timestamps, but BatchedInferencePipeline.transcribe() expects a flat list of seconds ([0.5, 3.2, ...]). This causes a TypeError at runtime.

Fix: Convert to flat [start1, end1, start2, end2, ...] format.

2. Windows console UnicodeEncodeError

On Windows, sys.stdout defaults to cp1252 encoding, which cannot represent Arabic, CJK, or other non-Latin characters. Any print() or log message containing a non-Latin project slug (e.g. مين-هو-مستر-عزت) crashes with UnicodeEncodeError.

Fix: Reconfigure stdout/stderr to UTF-8 with errors="replace" at CLI entry point, guarded by sys.platform == "win32".

3. `describe.py` crash on non-dict LLM response

json_repair.loads() can return a list, string, or None when small/local LLMs (e.g. Ollama qwen3:4b) produce malformed JSON. The subsequent .get() call crashes with AttributeError: 'list' object has no attribute 'get'.

Fix: Guard with isinstance(description, dict) check, wrapping non-dict responses into the expected schema.

Test plan

Verified all modified modules import correctly
Test faster-whisper transcription with --method faster-whisper (requires GPU or CPU with the model)
Test pipeline with non-Latin project names on Windows
Test describe stage with a small local LLM (e.g. Ollama qwen3:4b)

…esponse - faster-whisper: convert clip_timestamps to flat list of seconds [start1, end1, ...] instead of list of dicts, fixing TypeError in BatchedInferencePipeline.transcribe() - cli: reconfigure stdout/stderr to UTF-8 on Windows to prevent UnicodeEncodeError on non-Latin project names - describe: guard against json_repair.loads() returning a non-dict (e.g. list or string) from small/local LLMs

Ahmed-Ezzat20 force-pushed the fix/stt-and-cli-bugs branch from 4f62228 to 33e5741 Compare April 9, 2026 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix clip_timestamps format, Windows UTF-8 console, and non-dict LLM response#4

Fix clip_timestamps format, Windows UTF-8 console, and non-dict LLM response#4
Ahmed-Ezzat20 wants to merge 1 commit into
bakrianoo:masterfrom
Ahmed-Ezzat20:fix/stt-and-cli-bugs

Ahmed-Ezzat20 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ahmed-Ezzat20 commented Mar 30, 2026

Summary

1. clip_timestamps format bug (faster-whisper backend)

2. Windows console UnicodeEncodeError

3. describe.py crash on non-dict LLM response

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `clip_timestamps` format bug (faster-whisper backend)

3. `describe.py` crash on non-dict LLM response