docs(image-generation): document google provider

This commit is contained in:
Peter Steinberger 2026-03-16 23:20:28 -07:00
parent 618d35f933
commit c601dda389
No known key found for this signature in database
4 changed files with 15 additions and 3 deletions

View File

@ -877,6 +877,7 @@ Time format in system prompt. Default: `auto` (OS preference).
},
imageGenerationModel: {
primary: "openai/gpt-image-1",
fallbacks: ["google/gemini-3.1-flash-image-preview"],
},
pdfModel: {
primary: "anthropic/claude-opus-4-6",

View File

@ -360,6 +360,15 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
- Enable: `BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live src/agents/byteplus.live.test.ts`
- Optional model override: `BYTEPLUS_CODING_MODEL=ark-code-latest`
## Google image generation live
- Test: `src/image-generation/providers/google.live.test.ts`
- Enable: `GOOGLE_LIVE_TEST=1 pnpm test:live src/image-generation/providers/google.live.test.ts`
- Key source: `GEMINI_API_KEY` or `GOOGLE_API_KEY`
- Optional overrides:
- `GOOGLE_IMAGE_GENERATION_MODEL=gemini-3.1-flash-image-preview`
- `GOOGLE_IMAGE_BASE_URL=https://generativelanguage.googleapis.com/v1beta`
## Docker runners (optional “works in Linux” checks)
These run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace (and sourcing `~/.profile` if mounted). They also bind-mount CLI auth homes like `~/.codex`, `~/.claude`, `~/.qwen`, and `~/.minimax` when present, then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:

View File

@ -88,7 +88,7 @@ Image generation follows the standard shape:
1. core defines `ImageGenerationProvider`
2. core exposes `registerImageGenerationProvider(...)`
3. core exposes `runtime.imageGeneration.generate(...)`
4. the `openai` plugin registers an OpenAI-backed implementation
4. the `openai` and `google` plugins register vendor-backed implementations
5. future vendors can register the same contract without changing channels/tools
The config key is separate from vision-analysis routing:

View File

@ -116,8 +116,10 @@ Examples:
speech + media-understanding + image-generation behavior
- the bundled `elevenlabs` plugin owns ElevenLabs speech behavior
- the bundled `microsoft` plugin owns Microsoft speech behavior
- the bundled `google`, `minimax`, `mistral`, `moonshot`, and `zai` plugins own
their media-understanding backends
- the bundled `google` plugin owns Google model-provider behavior plus Google
media-understanding + image-generation + web-search behavior
- the bundled `minimax`, `mistral`, `moonshot`, and `zai` plugins own their
media-understanding backends
- the `voice-call` plugin is a feature plugin: it owns call transport, tools,
CLI, routes, and runtime, but it consumes core TTS/STT capability instead of
inventing a second speech stack