Skip to content

Add Google Gemini as an LLM provider#6

Open
Ahmed-Ezzat20 wants to merge 1 commit into
bakrianoo:masterfrom
Ahmed-Ezzat20:feat/gemini-llm-provider
Open

Add Google Gemini as an LLM provider#6
Ahmed-Ezzat20 wants to merge 1 commit into
bakrianoo:masterfrom
Ahmed-Ezzat20:feat/gemini-llm-provider

Conversation

@Ahmed-Ezzat20

Copy link
Copy Markdown
Contributor

Summary

  • Adds Google Gemini as a third LLM provider alongside OpenAI and Ollama
  • Implements _GeminiClient that mimics the client.chat.completions.create() interface, so all existing pipeline stages (describe, review, translate, resegment, thumbnails) work without changes
  • Supports text, multimodal (base64 images), and streaming via generate_content_stream
  • Converts OpenAI message format to Gemini's Content/Part types (including systemsystem_instruction)
  • New optional dependency: pip install "mazinger[llm-gemini]"

Changes

File Change
llm.py New _GeminiChatCompletions, _GeminiChat, _GeminiClient classes; gemini_api_key param on build_client()
cli/_groups.py New add_gemini() helper, wired into add_llm() and make_llm_client()
cli/_dub.py Pass gemini_api_key to MazingerDubber
cli/_transcribe.py Pass gemini_api_key to build_client() for ASR review
pipeline.py gemini_api_key on __init__ and _llm_client()
pyproject.toml New llm-gemini optional extra (google-genai>=1.0)

Usage

# Set API key
export GEMINI_API_KEY=your_key

# Full dubbing pipeline with Gemini for all LLM stages
mazinger dub video.mp4 --llm-model gemini-2.5-flash

# Or pass key explicitly
mazinger dub video.mp4 --gemini-api-key KEY --llm-model gemini-2.5-flash

Test plan

  • Verified text completion with gemini-2.5-flash — correct response and usage metrics
  • Verified multimodal (image) completion — correctly identifies image content
  • Verified CLI parser builds without conflicts across all subcommands
  • Verified MazingerDubber accepts and passes gemini_api_key through to build_client()
  • Test full dub pipeline end-to-end with Gemini as LLM provider

@Ahmed-Ezzat20 Ahmed-Ezzat20 force-pushed the feat/gemini-llm-provider branch from c30773e to 08280f1 Compare March 30, 2026 19:12
Adds a _GeminiClient that implements the same client.chat.completions.create()
interface used by the OpenAI and Ollama backends, routing to the google-genai
SDK internally. Supports text, multimodal (images), and streaming.
Auto-retries without penalty params when the model rejects them.

Usage: pass --gemini-api-key or set GEMINI_API_KEY, then use any Gemini model
name (e.g. --llm-model gemini-2.5-flash) for translation/analysis stages.
@Ahmed-Ezzat20 Ahmed-Ezzat20 force-pushed the feat/gemini-llm-provider branch from 08280f1 to d79fbe2 Compare April 9, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant