From c601dda389989e96280dbb7a850fad838ef97c3e Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Mon, 16 Mar 2026 23:20:28 -0700 Subject: [PATCH] docs(image-generation): document google provider --- docs/gateway/configuration-reference.md | 1 + docs/help/testing.md | 9 +++++++++ docs/tools/capability-cookbook.md | 2 +- docs/tools/plugin.md | 6 ++++-- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md index 910b6db2b62..ee823da9cac 100644 --- a/docs/gateway/configuration-reference.md +++ b/docs/gateway/configuration-reference.md @@ -877,6 +877,7 @@ Time format in system prompt. Default: `auto` (OS preference). }, imageGenerationModel: { primary: "openai/gpt-image-1", + fallbacks: ["google/gemini-3.1-flash-image-preview"], }, pdfModel: { primary: "anthropic/claude-opus-4-6", diff --git a/docs/help/testing.md b/docs/help/testing.md index ab63db23670..f3315fa6faa 100644 --- a/docs/help/testing.md +++ b/docs/help/testing.md @@ -360,6 +360,15 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local - Enable: `BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live src/agents/byteplus.live.test.ts` - Optional model override: `BYTEPLUS_CODING_MODEL=ark-code-latest` +## Google image generation live + +- Test: `src/image-generation/providers/google.live.test.ts` +- Enable: `GOOGLE_LIVE_TEST=1 pnpm test:live src/image-generation/providers/google.live.test.ts` +- Key source: `GEMINI_API_KEY` or `GOOGLE_API_KEY` +- Optional overrides: + - `GOOGLE_IMAGE_GENERATION_MODEL=gemini-3.1-flash-image-preview` + - `GOOGLE_IMAGE_BASE_URL=https://generativelanguage.googleapis.com/v1beta` + ## Docker runners (optional “works in Linux” checks) These run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace (and sourcing `~/.profile` if mounted). They also bind-mount CLI auth homes like `~/.codex`, `~/.claude`, `~/.qwen`, and `~/.minimax` when present, then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store: diff --git a/docs/tools/capability-cookbook.md b/docs/tools/capability-cookbook.md index 345c7b1ebd6..5cfc94ef3c0 100644 --- a/docs/tools/capability-cookbook.md +++ b/docs/tools/capability-cookbook.md @@ -88,7 +88,7 @@ Image generation follows the standard shape: 1. core defines `ImageGenerationProvider` 2. core exposes `registerImageGenerationProvider(...)` 3. core exposes `runtime.imageGeneration.generate(...)` -4. the `openai` plugin registers an OpenAI-backed implementation +4. the `openai` and `google` plugins register vendor-backed implementations 5. future vendors can register the same contract without changing channels/tools The config key is separate from vision-analysis routing: diff --git a/docs/tools/plugin.md b/docs/tools/plugin.md index 77ff383feb6..b0eec032bcf 100644 --- a/docs/tools/plugin.md +++ b/docs/tools/plugin.md @@ -116,8 +116,10 @@ Examples: speech + media-understanding + image-generation behavior - the bundled `elevenlabs` plugin owns ElevenLabs speech behavior - the bundled `microsoft` plugin owns Microsoft speech behavior -- the bundled `google`, `minimax`, `mistral`, `moonshot`, and `zai` plugins own - their media-understanding backends +- the bundled `google` plugin owns Google model-provider behavior plus Google + media-understanding + image-generation + web-search behavior +- the bundled `minimax`, `mistral`, `moonshot`, and `zai` plugins own their + media-understanding backends - the `voice-call` plugin is a feature plugin: it owns call transport, tools, CLI, routes, and runtime, but it consumes core TTS/STT capability instead of inventing a second speech stack