Image generation: native provider migration and explicit capabilities (#49551)
* Docs: retire nano-banana skill wrapper * Doctor: migrate nano-banana to native image generation * Image generation: align fal aspect ratio behavior * Image generation: make provider capabilities explicit
This commit is contained in:
parent
79f2173cd2
commit
21c2ba480a
@ -151,9 +151,12 @@ Docs: https://docs.openclaw.ai
|
||||
|
||||
### Breaking
|
||||
|
||||
- Skills/image generation: remove the bundled `nano-banana-pro` skill wrapper. Use `agents.defaults.imageGenerationModel.primary: "google/gemini-3-pro-image-preview"` for the native Nano Banana-style path instead.
|
||||
|
||||
- Browser/Chrome MCP: remove the legacy Chrome extension relay path, bundled extension assets, `driver: "extension"`, and `browser.relayBindHost`. Run `openclaw doctor --fix` to migrate host-local browser config to `existing-session` / `user`; Docker, headless, sandbox, and remote browser flows still use raw CDP. (#47893) Thanks @vincentkoc.
|
||||
- Plugins/runtime: remove the public `openclaw/extension-api` surface with no compatibility shim. Bundled plugins must use injected runtime for host-side operations (for example `api.runtime.agent.runEmbeddedPiAgent`) and any remaining direct imports must come from narrow `openclaw/plugin-sdk/*` subpaths instead of the monolithic SDK root.
|
||||
- Tools/image generation: standardize the stock image create/edit path on the core `image_generate` tool. The old `nano-banana-pro` docs/examples are gone; if you previously copied that sample-skill config, switch to `agents.defaults.imageGenerationModel` for built-in image generation or install a separate third-party skill explicitly.
|
||||
- Skills/image generation: remove the bundled `nano-banana-pro` skill wrapper. Use `agents.defaults.imageGenerationModel.primary: "google/gemini-3-pro-image-preview"` for the native Nano Banana-style path instead.
|
||||
- Plugins/message discovery: require `ChannelMessageActionAdapter.describeMessageTool(...)` for shared `message` tool discovery. The legacy `listActions`, `getCapabilities`, and `getToolSchema` adapter methods are removed. Plugin authors should migrate message discovery to `describeMessageTool(...)` and keep channel-specific action runtime code inside the owning plugin package. Thanks @gumadeiras.
|
||||
|
||||
## 2026.3.13
|
||||
|
||||
@ -905,7 +905,9 @@ Time format in system prompt. Default: `auto` (OS preference).
|
||||
- Also used as fallback routing when the selected/default model cannot accept image input.
|
||||
- `imageGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the shared image-generation capability and any future tool/plugin surface that generates images.
|
||||
- Typical values: `google/gemini-3-pro-image-preview` for the native Nano Banana-style flow, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-1` for OpenAI Images.
|
||||
- If omitted, `image_generate` can still infer a best-effort provider default from compatible auth-backed image-generation providers.
|
||||
- Typical values: `google/gemini-3-pro-image-preview`, `fal/fal-ai/flux/dev`, `openai/gpt-image-1`.
|
||||
- `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the `pdf` tool for model routing.
|
||||
- If omitted, the PDF tool falls back to `imageModel`, then to best-effort provider defaults.
|
||||
|
||||
@ -421,9 +421,24 @@ Notes:
|
||||
- Use `action: "list"` to inspect registered providers, default models, supported model ids, sizes, resolutions, and edit support.
|
||||
- Returns local `MEDIA:<path>` lines so channels can deliver the generated files directly.
|
||||
- Uses the image-generation model directly (independent of the main chat model).
|
||||
- Google-backed flows support reference-image edits plus explicit `1K|2K|4K` resolution hints.
|
||||
- Google-backed flows, including `google/gemini-3-pro-image-preview` for the native Nano Banana-style path, support reference-image edits plus explicit `1K|2K|4K` resolution hints.
|
||||
- When editing and `resolution` is omitted, OpenClaw infers a draft/final resolution from the input image size.
|
||||
- This is the built-in replacement for the old sample `nano-banana-pro` skill workflow. Use `agents.defaults.imageGenerationModel`, not `skills.entries`, for stock image generation.
|
||||
- This is the built-in replacement for the old `nano-banana-pro` skill workflow. Use `agents.defaults.imageGenerationModel`, not `skills.entries`, for stock image generation.
|
||||
|
||||
Native example:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "google/gemini-3-pro-image-preview", // native Nano Banana path
|
||||
fallbacks: ["fal/fal-ai/flux/dev"],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### `pdf`
|
||||
|
||||
|
||||
@ -42,6 +42,11 @@ For built-in image generation/editing, prefer `agents.defaults.imageGenerationMo
|
||||
plus the core `image_generate` tool. `skills.entries.*` is only for custom or
|
||||
third-party skill workflows.
|
||||
|
||||
Examples:
|
||||
|
||||
- Native Nano Banana-style setup: `agents.defaults.imageGenerationModel.primary: "google/gemini-3-pro-image-preview"`
|
||||
- Native fal setup: `agents.defaults.imageGenerationModel.primary: "fal/fal-ai/flux/dev"`
|
||||
|
||||
## Fields
|
||||
|
||||
- `allowBundled`: optional allowlist for **bundled** skills only. When set, only
|
||||
|
||||
@ -1,65 +0,0 @@
|
||||
---
|
||||
name: nano-banana-pro
|
||||
description: Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
|
||||
homepage: https://ai.google.dev/
|
||||
metadata:
|
||||
{
|
||||
"openclaw":
|
||||
{
|
||||
"emoji": "🍌",
|
||||
"requires": { "bins": ["uv"], "env": ["GEMINI_API_KEY"] },
|
||||
"primaryEnv": "GEMINI_API_KEY",
|
||||
"install":
|
||||
[
|
||||
{
|
||||
"id": "uv-brew",
|
||||
"kind": "brew",
|
||||
"formula": "uv",
|
||||
"bins": ["uv"],
|
||||
"label": "Install uv (brew)",
|
||||
},
|
||||
],
|
||||
},
|
||||
}
|
||||
---
|
||||
|
||||
# Nano Banana Pro (Gemini 3 Pro Image)
|
||||
|
||||
Use the bundled script to generate or edit images.
|
||||
|
||||
Generate
|
||||
|
||||
```bash
|
||||
uv run {baseDir}/scripts/generate_image.py --prompt "your image description" --filename "output.png" --resolution 1K
|
||||
```
|
||||
|
||||
Edit (single image)
|
||||
|
||||
```bash
|
||||
uv run {baseDir}/scripts/generate_image.py --prompt "edit instructions" --filename "output.png" -i "/path/in.png" --resolution 2K
|
||||
```
|
||||
|
||||
Multi-image composition (up to 14 images)
|
||||
|
||||
```bash
|
||||
uv run {baseDir}/scripts/generate_image.py --prompt "combine these into one scene" --filename "output.png" -i img1.png -i img2.png -i img3.png
|
||||
```
|
||||
|
||||
API key
|
||||
|
||||
- `GEMINI_API_KEY` env var
|
||||
- Or set `skills."nano-banana-pro".apiKey` / `skills."nano-banana-pro".env.GEMINI_API_KEY` in `~/.openclaw/openclaw.json`
|
||||
|
||||
Specific aspect ratio (optional)
|
||||
|
||||
```bash
|
||||
uv run {baseDir}/scripts/generate_image.py --prompt "portrait photo" --filename "output.png" --aspect-ratio 9:16
|
||||
```
|
||||
|
||||
Notes
|
||||
|
||||
- Resolutions: `1K` (default), `2K`, `4K`.
|
||||
- Aspect ratios: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`. Without `--aspect-ratio` / `-a`, the model picks freely - use this flag for avatars, profile pics, or consistent batch generation.
|
||||
- Use timestamps in filenames: `yyyy-mm-dd-hh-mm-ss-name.png`.
|
||||
- The script prints a `MEDIA:` line for OpenClaw to auto-attach on supported chat providers.
|
||||
- Do not read the image back; report the saved path only.
|
||||
@ -1,235 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# /// script
|
||||
# requires-python = ">=3.10"
|
||||
# dependencies = [
|
||||
# "google-genai>=1.0.0",
|
||||
# "pillow>=10.0.0",
|
||||
# ]
|
||||
# ///
|
||||
"""
|
||||
Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) API.
|
||||
|
||||
Usage:
|
||||
uv run generate_image.py --prompt "your image description" --filename "output.png" [--resolution 1K|2K|4K] [--api-key KEY]
|
||||
|
||||
Multi-image editing (up to 14 images):
|
||||
uv run generate_image.py --prompt "combine these images" --filename "output.png" -i img1.png -i img2.png -i img3.png
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
SUPPORTED_ASPECT_RATIOS = [
|
||||
"1:1",
|
||||
"2:3",
|
||||
"3:2",
|
||||
"3:4",
|
||||
"4:3",
|
||||
"4:5",
|
||||
"5:4",
|
||||
"9:16",
|
||||
"16:9",
|
||||
"21:9",
|
||||
]
|
||||
|
||||
|
||||
def get_api_key(provided_key: str | None) -> str | None:
|
||||
"""Get API key from argument first, then environment."""
|
||||
if provided_key:
|
||||
return provided_key
|
||||
return os.environ.get("GEMINI_API_KEY")
|
||||
|
||||
|
||||
def auto_detect_resolution(max_input_dim: int) -> str:
|
||||
"""Infer output resolution from the largest input image dimension."""
|
||||
if max_input_dim >= 3000:
|
||||
return "4K"
|
||||
if max_input_dim >= 1500:
|
||||
return "2K"
|
||||
return "1K"
|
||||
|
||||
|
||||
def choose_output_resolution(
|
||||
requested_resolution: str | None,
|
||||
max_input_dim: int,
|
||||
has_input_images: bool,
|
||||
) -> tuple[str, bool]:
|
||||
"""Choose final resolution and whether it was auto-detected.
|
||||
|
||||
Auto-detection is only applied when the user did not pass --resolution.
|
||||
"""
|
||||
if requested_resolution is not None:
|
||||
return requested_resolution, False
|
||||
|
||||
if has_input_images and max_input_dim > 0:
|
||||
return auto_detect_resolution(max_input_dim), True
|
||||
|
||||
return "1K", False
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate images using Nano Banana Pro (Gemini 3 Pro Image)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--prompt", "-p",
|
||||
required=True,
|
||||
help="Image description/prompt"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--filename", "-f",
|
||||
required=True,
|
||||
help="Output filename (e.g., sunset-mountains.png)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input-image", "-i",
|
||||
action="append",
|
||||
dest="input_images",
|
||||
metavar="IMAGE",
|
||||
help="Input image path(s) for editing/composition. Can be specified multiple times (up to 14 images)."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--resolution", "-r",
|
||||
choices=["1K", "2K", "4K"],
|
||||
default=None,
|
||||
help="Output resolution: 1K, 2K, or 4K. If omitted with input images, auto-detect from largest image dimension."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--aspect-ratio", "-a",
|
||||
choices=SUPPORTED_ASPECT_RATIOS,
|
||||
default=None,
|
||||
help=f"Output aspect ratio (default: model decides). Options: {', '.join(SUPPORTED_ASPECT_RATIOS)}"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--api-key", "-k",
|
||||
help="Gemini API key (overrides GEMINI_API_KEY env var)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Get API key
|
||||
api_key = get_api_key(args.api_key)
|
||||
if not api_key:
|
||||
print("Error: No API key provided.", file=sys.stderr)
|
||||
print("Please either:", file=sys.stderr)
|
||||
print(" 1. Provide --api-key argument", file=sys.stderr)
|
||||
print(" 2. Set GEMINI_API_KEY environment variable", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Import here after checking API key to avoid slow import on error
|
||||
from google import genai
|
||||
from google.genai import types
|
||||
from PIL import Image as PILImage
|
||||
|
||||
# Initialise client
|
||||
client = genai.Client(api_key=api_key)
|
||||
|
||||
# Set up output path
|
||||
output_path = Path(args.filename)
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Load input images if provided (up to 14 supported by Nano Banana Pro)
|
||||
input_images = []
|
||||
max_input_dim = 0
|
||||
if args.input_images:
|
||||
if len(args.input_images) > 14:
|
||||
print(f"Error: Too many input images ({len(args.input_images)}). Maximum is 14.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
for img_path in args.input_images:
|
||||
try:
|
||||
with PILImage.open(img_path) as img:
|
||||
copied = img.copy()
|
||||
width, height = copied.size
|
||||
input_images.append(copied)
|
||||
print(f"Loaded input image: {img_path}")
|
||||
|
||||
# Track largest dimension for auto-resolution
|
||||
max_input_dim = max(max_input_dim, width, height)
|
||||
except Exception as e:
|
||||
print(f"Error loading input image '{img_path}': {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
output_resolution, auto_detected = choose_output_resolution(
|
||||
requested_resolution=args.resolution,
|
||||
max_input_dim=max_input_dim,
|
||||
has_input_images=bool(input_images),
|
||||
)
|
||||
if auto_detected:
|
||||
print(
|
||||
f"Auto-detected resolution: {output_resolution} "
|
||||
f"(from max input dimension {max_input_dim})"
|
||||
)
|
||||
|
||||
# Build contents (images first if editing, prompt only if generating)
|
||||
if input_images:
|
||||
contents = [*input_images, args.prompt]
|
||||
img_count = len(input_images)
|
||||
print(f"Processing {img_count} image{'s' if img_count > 1 else ''} with resolution {output_resolution}...")
|
||||
else:
|
||||
contents = args.prompt
|
||||
print(f"Generating image with resolution {output_resolution}...")
|
||||
|
||||
try:
|
||||
# Build image config with optional aspect ratio
|
||||
image_cfg_kwargs = {"image_size": output_resolution}
|
||||
if args.aspect_ratio:
|
||||
image_cfg_kwargs["aspect_ratio"] = args.aspect_ratio
|
||||
|
||||
response = client.models.generate_content(
|
||||
model="gemini-3-pro-image-preview",
|
||||
contents=contents,
|
||||
config=types.GenerateContentConfig(
|
||||
response_modalities=["TEXT", "IMAGE"],
|
||||
image_config=types.ImageConfig(**image_cfg_kwargs)
|
||||
)
|
||||
)
|
||||
|
||||
# Process response and convert to PNG
|
||||
image_saved = False
|
||||
for part in response.parts:
|
||||
if part.text is not None:
|
||||
print(f"Model response: {part.text}")
|
||||
elif part.inline_data is not None:
|
||||
# Convert inline data to PIL Image and save as PNG
|
||||
from io import BytesIO
|
||||
|
||||
# inline_data.data is already bytes, not base64
|
||||
image_data = part.inline_data.data
|
||||
if isinstance(image_data, str):
|
||||
# If it's a string, it might be base64
|
||||
import base64
|
||||
image_data = base64.b64decode(image_data)
|
||||
|
||||
image = PILImage.open(BytesIO(image_data))
|
||||
|
||||
# Ensure RGB mode for PNG (convert RGBA to RGB with white background if needed)
|
||||
if image.mode == 'RGBA':
|
||||
rgb_image = PILImage.new('RGB', image.size, (255, 255, 255))
|
||||
rgb_image.paste(image, mask=image.split()[3])
|
||||
rgb_image.save(str(output_path), 'PNG')
|
||||
elif image.mode == 'RGB':
|
||||
image.save(str(output_path), 'PNG')
|
||||
else:
|
||||
image.convert('RGB').save(str(output_path), 'PNG')
|
||||
image_saved = True
|
||||
|
||||
if image_saved:
|
||||
full_path = output_path.resolve()
|
||||
print(f"\nImage saved: {full_path}")
|
||||
# OpenClaw parses MEDIA: tokens and will attach the file on
|
||||
# supported chat providers. Emit the canonical MEDIA:<path> form.
|
||||
print(f"MEDIA:{full_path}")
|
||||
else:
|
||||
print("Error: No image was generated in the response.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error generating image: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -1,36 +0,0 @@
|
||||
import importlib.util
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
MODULE_PATH = Path(__file__).with_name("generate_image.py")
|
||||
SPEC = importlib.util.spec_from_file_location("generate_image", MODULE_PATH)
|
||||
assert SPEC and SPEC.loader
|
||||
MODULE = importlib.util.module_from_spec(SPEC)
|
||||
SPEC.loader.exec_module(MODULE)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
("max_input_dim", "expected"),
|
||||
[
|
||||
(0, "1K"),
|
||||
(1499, "1K"),
|
||||
(1500, "2K"),
|
||||
(2999, "2K"),
|
||||
(3000, "4K"),
|
||||
],
|
||||
)
|
||||
def test_auto_detect_resolution_thresholds(max_input_dim, expected):
|
||||
assert MODULE.auto_detect_resolution(max_input_dim) == expected
|
||||
|
||||
|
||||
def test_choose_output_resolution_auto_detects_when_resolution_omitted():
|
||||
assert MODULE.choose_output_resolution(None, 2200, True) == ("2K", True)
|
||||
|
||||
|
||||
def test_choose_output_resolution_defaults_to_1k_without_inputs():
|
||||
assert MODULE.choose_output_resolution(None, 0, False) == ("1K", False)
|
||||
|
||||
|
||||
def test_choose_output_resolution_respects_explicit_1k_with_large_input():
|
||||
assert MODULE.choose_output_resolution("1K", 3500, True) == ("1K", False)
|
||||
@ -14,8 +14,23 @@ function stubImageGenerationProviders() {
|
||||
id: "google",
|
||||
defaultModel: "gemini-3.1-flash-image-preview",
|
||||
models: ["gemini-3.1-flash-image-preview", "gemini-3-pro-image-preview"],
|
||||
supportedResolutions: ["1K", "2K", "4K"],
|
||||
supportsImageEditing: true,
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 5,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
geometry: {
|
||||
resolutions: ["1K", "2K", "4K"],
|
||||
aspectRatios: ["1:1", "16:9"],
|
||||
},
|
||||
},
|
||||
generateImage: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
}),
|
||||
@ -24,8 +39,19 @@ function stubImageGenerationProviders() {
|
||||
id: "openai",
|
||||
defaultModel: "gpt-image-1",
|
||||
models: ["gpt-image-1"],
|
||||
supportedSizes: ["1024x1024", "1024x1536", "1536x1024"],
|
||||
supportsImageEditing: false,
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsSize: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: false,
|
||||
maxInputImages: 0,
|
||||
},
|
||||
geometry: {
|
||||
sizes: ["1024x1024", "1024x1536", "1536x1024"],
|
||||
},
|
||||
},
|
||||
generateImage: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
}),
|
||||
@ -138,6 +164,7 @@ describe("createImageGenerateTool", () => {
|
||||
const result = await tool.execute("call-1", {
|
||||
prompt: "A cat wearing sunglasses",
|
||||
model: "openai/gpt-image-1",
|
||||
filename: "cats/output.png",
|
||||
count: 2,
|
||||
size: "1024x1024",
|
||||
});
|
||||
@ -167,7 +194,7 @@ describe("createImageGenerateTool", () => {
|
||||
"image/png",
|
||||
"tool-image-generation",
|
||||
undefined,
|
||||
"cat-one.png",
|
||||
"cats/output.png",
|
||||
);
|
||||
expect(saveMediaBuffer).toHaveBeenNthCalledWith(
|
||||
2,
|
||||
@ -175,7 +202,7 @@ describe("createImageGenerateTool", () => {
|
||||
"image/png",
|
||||
"tool-image-generation",
|
||||
undefined,
|
||||
"cat-two.png",
|
||||
"cats/output.png",
|
||||
);
|
||||
expect(result).toMatchObject({
|
||||
content: [
|
||||
@ -189,6 +216,7 @@ describe("createImageGenerateTool", () => {
|
||||
model: "gpt-image-1",
|
||||
count: 2,
|
||||
paths: ["/tmp/generated-1.png", "/tmp/generated-2.png"],
|
||||
filename: "cats/output.png",
|
||||
revisedPrompts: ["A more cinematic cat"],
|
||||
},
|
||||
});
|
||||
@ -273,6 +301,7 @@ describe("createImageGenerateTool", () => {
|
||||
|
||||
expect(generateImage).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
aspectRatio: undefined,
|
||||
resolution: "4K",
|
||||
inputImages: [
|
||||
expect.objectContaining({
|
||||
@ -284,6 +313,91 @@ describe("createImageGenerateTool", () => {
|
||||
);
|
||||
});
|
||||
|
||||
it("forwards explicit aspect ratio and supports up to 5 reference images", async () => {
|
||||
const generateImage = vi.spyOn(imageGenerationRuntime, "generateImage").mockResolvedValue({
|
||||
provider: "google",
|
||||
model: "gemini-3-pro-image-preview",
|
||||
attempts: [],
|
||||
images: [
|
||||
{
|
||||
buffer: Buffer.from("png-out"),
|
||||
mimeType: "image/png",
|
||||
fileName: "edited.png",
|
||||
},
|
||||
],
|
||||
});
|
||||
vi.spyOn(webMedia, "loadWebMedia").mockResolvedValue({
|
||||
kind: "image",
|
||||
buffer: Buffer.from("input-image"),
|
||||
contentType: "image/png",
|
||||
});
|
||||
vi.spyOn(mediaStore, "saveMediaBuffer").mockResolvedValue({
|
||||
path: "/tmp/edited.png",
|
||||
id: "edited.png",
|
||||
size: 7,
|
||||
contentType: "image/png",
|
||||
});
|
||||
|
||||
const tool = createImageGenerateTool({
|
||||
config: {
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "google/gemini-3-pro-image-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
workspaceDir: process.cwd(),
|
||||
});
|
||||
|
||||
expect(tool).not.toBeNull();
|
||||
if (!tool) {
|
||||
throw new Error("expected image_generate tool");
|
||||
}
|
||||
|
||||
const images = Array.from({ length: 5 }, (_, index) => `./fixtures/ref-${index + 1}.png`);
|
||||
await tool.execute("call-compose", {
|
||||
prompt: "Combine these into one scene",
|
||||
images,
|
||||
aspectRatio: "16:9",
|
||||
});
|
||||
|
||||
expect(generateImage).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
aspectRatio: "16:9",
|
||||
inputImages: expect.arrayContaining([
|
||||
expect.objectContaining({ buffer: Buffer.from("input-image"), mimeType: "image/png" }),
|
||||
]),
|
||||
}),
|
||||
);
|
||||
expect(generateImage.mock.calls[0]?.[0].inputImages).toHaveLength(5);
|
||||
});
|
||||
|
||||
it("rejects unsupported aspect ratios", async () => {
|
||||
const tool = createImageGenerateTool({
|
||||
config: {
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "google/gemini-3-pro-image-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(tool).not.toBeNull();
|
||||
if (!tool) {
|
||||
throw new Error("expected image_generate tool");
|
||||
}
|
||||
|
||||
await expect(tool.execute("call-bad-aspect", { prompt: "portrait", aspectRatio: "7:5" }))
|
||||
.rejects.toThrow(
|
||||
"aspectRatio must be one of 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, or 21:9",
|
||||
);
|
||||
});
|
||||
|
||||
it("lists registered provider and model options", async () => {
|
||||
stubImageGenerationProviders();
|
||||
|
||||
@ -310,7 +424,8 @@ describe("createImageGenerateTool", () => {
|
||||
expect(text).toContain("google (default gemini-3.1-flash-image-preview)");
|
||||
expect(text).toContain("gemini-3.1-flash-image-preview");
|
||||
expect(text).toContain("gemini-3-pro-image-preview");
|
||||
expect(text).toContain("editing");
|
||||
expect(text).toContain("editing up to 5 refs");
|
||||
expect(text).toContain("aspect ratios 1:1, 16:9");
|
||||
expect(result).toMatchObject({
|
||||
details: {
|
||||
providers: expect.arrayContaining([
|
||||
@ -321,9 +436,139 @@ describe("createImageGenerateTool", () => {
|
||||
"gemini-3.1-flash-image-preview",
|
||||
"gemini-3-pro-image-preview",
|
||||
]),
|
||||
capabilities: expect.objectContaining({
|
||||
edit: expect.objectContaining({
|
||||
enabled: true,
|
||||
maxInputImages: 5,
|
||||
}),
|
||||
}),
|
||||
}),
|
||||
]),
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it("rejects provider-specific edit limits before runtime", async () => {
|
||||
vi.spyOn(imageGenerationRuntime, "listRuntimeImageGenerationProviders").mockReturnValue([
|
||||
{
|
||||
id: "fal",
|
||||
defaultModel: "fal-ai/flux/dev",
|
||||
models: ["fal-ai/flux/dev", "fal-ai/flux/dev/image-to-image"],
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 1,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: true,
|
||||
},
|
||||
},
|
||||
generateImage: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
}),
|
||||
},
|
||||
]);
|
||||
const generateImage = vi.spyOn(imageGenerationRuntime, "generateImage");
|
||||
vi.spyOn(webMedia, "loadWebMedia").mockResolvedValue({
|
||||
kind: "image",
|
||||
buffer: Buffer.from("input-image"),
|
||||
contentType: "image/png",
|
||||
});
|
||||
|
||||
const tool = createImageGenerateTool({
|
||||
config: {
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "fal/fal-ai/flux/dev",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
workspaceDir: process.cwd(),
|
||||
});
|
||||
|
||||
expect(tool).not.toBeNull();
|
||||
if (!tool) {
|
||||
throw new Error("expected image_generate tool");
|
||||
}
|
||||
|
||||
await expect(
|
||||
tool.execute("call-fal-edit", {
|
||||
prompt: "combine",
|
||||
images: ["./fixtures/a.png", "./fixtures/b.png"],
|
||||
}),
|
||||
).rejects.toThrow("fal edit supports at most 1 reference image");
|
||||
expect(generateImage).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it("rejects unsupported provider-specific edit aspect ratio overrides before runtime", async () => {
|
||||
vi.spyOn(imageGenerationRuntime, "listRuntimeImageGenerationProviders").mockReturnValue([
|
||||
{
|
||||
id: "fal",
|
||||
defaultModel: "fal-ai/flux/dev",
|
||||
models: ["fal-ai/flux/dev", "fal-ai/flux/dev/image-to-image"],
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 1,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: true,
|
||||
},
|
||||
geometry: {
|
||||
aspectRatios: ["1:1", "16:9"],
|
||||
},
|
||||
},
|
||||
generateImage: vi.fn(async () => {
|
||||
throw new Error("not used");
|
||||
}),
|
||||
},
|
||||
]);
|
||||
const generateImage = vi.spyOn(imageGenerationRuntime, "generateImage");
|
||||
vi.spyOn(webMedia, "loadWebMedia").mockResolvedValue({
|
||||
kind: "image",
|
||||
buffer: Buffer.from("input-image"),
|
||||
contentType: "image/png",
|
||||
});
|
||||
|
||||
const tool = createImageGenerateTool({
|
||||
config: {
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "fal/fal-ai/flux/dev",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
workspaceDir: process.cwd(),
|
||||
});
|
||||
|
||||
expect(tool).not.toBeNull();
|
||||
if (!tool) {
|
||||
throw new Error("expected image_generate tool");
|
||||
}
|
||||
|
||||
await expect(
|
||||
tool.execute("call-fal-aspect", {
|
||||
prompt: "edit",
|
||||
image: "./fixtures/a.png",
|
||||
aspectRatio: "16:9",
|
||||
}),
|
||||
).rejects.toThrow("fal edit does not support aspectRatio overrides");
|
||||
expect(generateImage).not.toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
|
||||
@ -6,6 +6,7 @@ import {
|
||||
listRuntimeImageGenerationProviders,
|
||||
} from "../../image-generation/runtime.js";
|
||||
import type {
|
||||
ImageGenerationProvider,
|
||||
ImageGenerationResolution,
|
||||
ImageGenerationSourceImage,
|
||||
} from "../../image-generation/types.js";
|
||||
@ -36,8 +37,20 @@ import {
|
||||
|
||||
const DEFAULT_COUNT = 1;
|
||||
const MAX_COUNT = 4;
|
||||
const MAX_INPUT_IMAGES = 4;
|
||||
const MAX_INPUT_IMAGES = 5;
|
||||
const DEFAULT_RESOLUTION: ImageGenerationResolution = "1K";
|
||||
const SUPPORTED_ASPECT_RATIOS = new Set([
|
||||
"1:1",
|
||||
"2:3",
|
||||
"3:2",
|
||||
"3:4",
|
||||
"4:3",
|
||||
"4:5",
|
||||
"5:4",
|
||||
"9:16",
|
||||
"16:9",
|
||||
"21:9",
|
||||
]);
|
||||
|
||||
const ImageGenerateToolSchema = Type.Object({
|
||||
action: Type.Optional(
|
||||
@ -60,12 +73,24 @@ const ImageGenerateToolSchema = Type.Object({
|
||||
model: Type.Optional(
|
||||
Type.String({ description: "Optional provider/model override, e.g. openai/gpt-image-1." }),
|
||||
),
|
||||
filename: Type.Optional(
|
||||
Type.String({
|
||||
description:
|
||||
"Optional output filename hint. OpenClaw preserves the basename and saves under its managed media directory.",
|
||||
}),
|
||||
),
|
||||
size: Type.Optional(
|
||||
Type.String({
|
||||
description:
|
||||
"Optional size hint like 1024x1024, 1536x1024, 1024x1536, 1024x1792, or 1792x1024.",
|
||||
}),
|
||||
),
|
||||
aspectRatio: Type.Optional(
|
||||
Type.String({
|
||||
description:
|
||||
"Optional aspect ratio hint: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, or 21:9.",
|
||||
}),
|
||||
),
|
||||
resolution: Type.Optional(
|
||||
Type.String({
|
||||
description:
|
||||
@ -162,6 +187,19 @@ function normalizeResolution(raw: string | undefined): ImageGenerationResolution
|
||||
throw new ToolInputError("resolution must be one of 1K, 2K, or 4K");
|
||||
}
|
||||
|
||||
function normalizeAspectRatio(raw: string | undefined): string | undefined {
|
||||
const normalized = raw?.trim();
|
||||
if (!normalized) {
|
||||
return undefined;
|
||||
}
|
||||
if (SUPPORTED_ASPECT_RATIOS.has(normalized)) {
|
||||
return normalized;
|
||||
}
|
||||
throw new ToolInputError(
|
||||
"aspectRatio must be one of 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, or 21:9",
|
||||
);
|
||||
}
|
||||
|
||||
function normalizeReferenceImages(args: Record<string, unknown>): string[] {
|
||||
const imageCandidates: string[] = [];
|
||||
if (typeof args.image === "string") {
|
||||
@ -192,6 +230,112 @@ function normalizeReferenceImages(args: Record<string, unknown>): string[] {
|
||||
return normalized;
|
||||
}
|
||||
|
||||
function parseImageGenerationModelRef(raw: string | undefined): { provider: string; model: string } | null {
|
||||
const trimmed = raw?.trim();
|
||||
if (!trimmed) {
|
||||
return null;
|
||||
}
|
||||
const slashIndex = trimmed.indexOf("/");
|
||||
if (slashIndex <= 0 || slashIndex === trimmed.length - 1) {
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
provider: trimmed.slice(0, slashIndex).trim(),
|
||||
model: trimmed.slice(slashIndex + 1).trim(),
|
||||
};
|
||||
}
|
||||
|
||||
function resolveSelectedImageGenerationProvider(params: {
|
||||
config?: OpenClawConfig;
|
||||
imageGenerationModelConfig: ToolModelConfig;
|
||||
modelOverride?: string;
|
||||
}): ImageGenerationProvider | undefined {
|
||||
const selectedRef =
|
||||
parseImageGenerationModelRef(params.modelOverride) ??
|
||||
parseImageGenerationModelRef(params.imageGenerationModelConfig.primary);
|
||||
if (!selectedRef) {
|
||||
return undefined;
|
||||
}
|
||||
return listRuntimeImageGenerationProviders({ config: params.config }).find(
|
||||
(provider) =>
|
||||
provider.id === selectedRef.provider || (provider.aliases ?? []).includes(selectedRef.provider),
|
||||
);
|
||||
}
|
||||
|
||||
function validateImageGenerationCapabilities(params: {
|
||||
provider: ImageGenerationProvider | undefined;
|
||||
count: number;
|
||||
inputImageCount: number;
|
||||
size?: string;
|
||||
aspectRatio?: string;
|
||||
resolution?: ImageGenerationResolution;
|
||||
}) {
|
||||
const provider = params.provider;
|
||||
if (!provider) {
|
||||
return;
|
||||
}
|
||||
const isEdit = params.inputImageCount > 0;
|
||||
const modeCaps = isEdit ? provider.capabilities.edit : provider.capabilities.generate;
|
||||
const geometry = provider.capabilities.geometry;
|
||||
const maxCount = modeCaps.maxCount ?? MAX_COUNT;
|
||||
if (params.count > maxCount) {
|
||||
throw new ToolInputError(
|
||||
`${provider.id} ${isEdit ? "edit" : "generate"} supports at most ${maxCount} output image${maxCount === 1 ? "" : "s"}.`,
|
||||
);
|
||||
}
|
||||
|
||||
if (isEdit) {
|
||||
if (!provider.capabilities.edit.enabled) {
|
||||
throw new ToolInputError(`${provider.id} does not support reference-image edits.`);
|
||||
}
|
||||
const maxInputImages = provider.capabilities.edit.maxInputImages ?? MAX_INPUT_IMAGES;
|
||||
if (params.inputImageCount > maxInputImages) {
|
||||
throw new ToolInputError(
|
||||
`${provider.id} edit supports at most ${maxInputImages} reference image${maxInputImages === 1 ? "" : "s"}.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.size) {
|
||||
if (!modeCaps.supportsSize) {
|
||||
throw new ToolInputError(`${provider.id} ${isEdit ? "edit" : "generate"} does not support size overrides.`);
|
||||
}
|
||||
if ((geometry?.sizes?.length ?? 0) > 0 && !geometry?.sizes?.includes(params.size)) {
|
||||
throw new ToolInputError(
|
||||
`${provider.id} ${isEdit ? "edit" : "generate"} size must be one of ${geometry?.sizes?.join(", ")}.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.aspectRatio) {
|
||||
if (!modeCaps.supportsAspectRatio) {
|
||||
throw new ToolInputError(`${provider.id} ${isEdit ? "edit" : "generate"} does not support aspectRatio overrides.`);
|
||||
}
|
||||
if (
|
||||
(geometry?.aspectRatios?.length ?? 0) > 0 &&
|
||||
!geometry?.aspectRatios?.includes(params.aspectRatio)
|
||||
) {
|
||||
throw new ToolInputError(
|
||||
`${provider.id} ${isEdit ? "edit" : "generate"} aspectRatio must be one of ${geometry?.aspectRatios?.join(", ")}.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.resolution) {
|
||||
if (!modeCaps.supportsResolution) {
|
||||
throw new ToolInputError(`${provider.id} ${isEdit ? "edit" : "generate"} does not support resolution overrides.`);
|
||||
}
|
||||
if (
|
||||
(geometry?.resolutions?.length ?? 0) > 0 &&
|
||||
!geometry?.resolutions?.includes(params.resolution)
|
||||
) {
|
||||
throw new ToolInputError(
|
||||
`${provider.id} ${isEdit ? "edit" : "generate"} resolution must be one of ${geometry?.resolutions?.join("/")}.`,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type ImageGenerateSandboxConfig = {
|
||||
root: string;
|
||||
bridge: SandboxFsBridge;
|
||||
@ -357,25 +501,25 @@ export function createImageGenerateTool(options?: {
|
||||
...(provider.label ? { label: provider.label } : {}),
|
||||
...(provider.defaultModel ? { defaultModel: provider.defaultModel } : {}),
|
||||
models: provider.models ?? (provider.defaultModel ? [provider.defaultModel] : []),
|
||||
...(provider.supportedSizes ? { supportedSizes: [...provider.supportedSizes] } : {}),
|
||||
...(provider.supportedResolutions
|
||||
? { supportedResolutions: [...provider.supportedResolutions] }
|
||||
: {}),
|
||||
...(typeof provider.supportsImageEditing === "boolean"
|
||||
? { supportsImageEditing: provider.supportsImageEditing }
|
||||
: {}),
|
||||
capabilities: provider.capabilities,
|
||||
}),
|
||||
);
|
||||
const lines = providers.flatMap((provider) => {
|
||||
const caps: string[] = [];
|
||||
if (provider.supportsImageEditing) {
|
||||
caps.push("editing");
|
||||
if (provider.capabilities.edit.enabled) {
|
||||
const maxRefs = provider.capabilities.edit.maxInputImages;
|
||||
caps.push(
|
||||
`editing${typeof maxRefs === "number" ? ` up to ${maxRefs} ref${maxRefs === 1 ? "" : "s"}` : ""}`,
|
||||
);
|
||||
}
|
||||
if ((provider.supportedResolutions?.length ?? 0) > 0) {
|
||||
caps.push(`resolutions ${provider.supportedResolutions?.join("/")}`);
|
||||
if ((provider.capabilities.geometry?.resolutions?.length ?? 0) > 0) {
|
||||
caps.push(`resolutions ${provider.capabilities.geometry?.resolutions?.join("/")}`);
|
||||
}
|
||||
if ((provider.supportedSizes?.length ?? 0) > 0) {
|
||||
caps.push(`sizes ${provider.supportedSizes?.join(", ")}`);
|
||||
if ((provider.capabilities.geometry?.sizes?.length ?? 0) > 0) {
|
||||
caps.push(`sizes ${provider.capabilities.geometry?.sizes?.join(", ")}`);
|
||||
}
|
||||
if ((provider.capabilities.geometry?.aspectRatios?.length ?? 0) > 0) {
|
||||
caps.push(`aspect ratios ${provider.capabilities.geometry?.aspectRatios?.join(", ")}`);
|
||||
}
|
||||
const modelLine =
|
||||
provider.models.length > 0
|
||||
@ -396,7 +540,9 @@ export function createImageGenerateTool(options?: {
|
||||
const prompt = readStringParam(params, "prompt", { required: true });
|
||||
const imageInputs = normalizeReferenceImages(params);
|
||||
const model = readStringParam(params, "model");
|
||||
const filename = readStringParam(params, "filename");
|
||||
const size = readStringParam(params, "size");
|
||||
const aspectRatio = normalizeAspectRatio(readStringParam(params, "aspectRatio"));
|
||||
const explicitResolution = normalizeResolution(readStringParam(params, "resolution"));
|
||||
const count = resolveRequestedCount(params);
|
||||
const loadedReferenceImages = await loadReferenceImages({
|
||||
@ -412,6 +558,19 @@ export function createImageGenerateTool(options?: {
|
||||
: inputImages.length > 0
|
||||
? await inferResolutionFromInputImages(inputImages)
|
||||
: undefined);
|
||||
const selectedProvider = resolveSelectedImageGenerationProvider({
|
||||
config: effectiveCfg,
|
||||
imageGenerationModelConfig,
|
||||
modelOverride: model,
|
||||
});
|
||||
validateImageGenerationCapabilities({
|
||||
provider: selectedProvider,
|
||||
count,
|
||||
inputImageCount: inputImages.length,
|
||||
size,
|
||||
aspectRatio,
|
||||
resolution,
|
||||
});
|
||||
|
||||
const result = await generateImage({
|
||||
cfg: effectiveCfg,
|
||||
@ -419,6 +578,7 @@ export function createImageGenerateTool(options?: {
|
||||
agentDir: options?.agentDir,
|
||||
modelOverride: model,
|
||||
size,
|
||||
aspectRatio,
|
||||
resolution,
|
||||
count,
|
||||
inputImages,
|
||||
@ -431,7 +591,7 @@ export function createImageGenerateTool(options?: {
|
||||
image.mimeType,
|
||||
"tool-image-generation",
|
||||
undefined,
|
||||
image.fileName,
|
||||
filename || image.fileName,
|
||||
),
|
||||
),
|
||||
);
|
||||
@ -468,6 +628,8 @@ export function createImageGenerateTool(options?: {
|
||||
: {}),
|
||||
...(resolution ? { resolution } : {}),
|
||||
...(size ? { size } : {}),
|
||||
...(aspectRatio ? { aspectRatio } : {}),
|
||||
...(filename ? { filename } : {}),
|
||||
attempts: result.attempts,
|
||||
metadata: result.metadata,
|
||||
...(revisedPrompts.length > 0 ? { revisedPrompts } : {}),
|
||||
|
||||
@ -297,4 +297,99 @@ describe("normalizeCompatibilityConfigValues", () => {
|
||||
"Moved browser.ssrfPolicy.allowPrivateNetwork → browser.ssrfPolicy.dangerouslyAllowPrivateNetwork (true).",
|
||||
);
|
||||
});
|
||||
|
||||
it("migrates nano-banana skill config to native image generation config", () => {
|
||||
const res = normalizeCompatibilityConfigValues({
|
||||
skills: {
|
||||
entries: {
|
||||
"nano-banana-pro": {
|
||||
enabled: true,
|
||||
apiKey: { source: "env", provider: "default", id: "GEMINI_API_KEY" },
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(res.config.agents?.defaults?.imageGenerationModel).toEqual({
|
||||
primary: "google/gemini-3-pro-image-preview",
|
||||
});
|
||||
expect(res.config.models?.providers?.google?.apiKey).toEqual({
|
||||
source: "env",
|
||||
provider: "default",
|
||||
id: "GEMINI_API_KEY",
|
||||
});
|
||||
expect(res.config.skills?.entries).toBeUndefined();
|
||||
expect(res.changes).toEqual([
|
||||
"Moved skills.entries.nano-banana-pro → agents.defaults.imageGenerationModel.primary (google/gemini-3-pro-image-preview).",
|
||||
"Moved skills.entries.nano-banana-pro.apiKey → models.providers.google.apiKey.",
|
||||
"Removed legacy skills.entries.nano-banana-pro.",
|
||||
]);
|
||||
});
|
||||
|
||||
it("prefers legacy nano-banana env.GEMINI_API_KEY over skill apiKey during migration", () => {
|
||||
const res = normalizeCompatibilityConfigValues({
|
||||
skills: {
|
||||
entries: {
|
||||
"nano-banana-pro": {
|
||||
apiKey: "ignored-skill-api-key",
|
||||
env: {
|
||||
GEMINI_API_KEY: "env-gemini-key",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(res.config.models?.providers?.google?.apiKey).toBe("env-gemini-key");
|
||||
expect(res.changes).toContain(
|
||||
"Moved skills.entries.nano-banana-pro.env.GEMINI_API_KEY → models.providers.google.apiKey.",
|
||||
);
|
||||
});
|
||||
|
||||
it("preserves explicit native config while removing legacy nano-banana skill config", () => {
|
||||
const res = normalizeCompatibilityConfigValues({
|
||||
agents: {
|
||||
defaults: {
|
||||
imageGenerationModel: {
|
||||
primary: "fal/fal-ai/flux/dev",
|
||||
},
|
||||
},
|
||||
},
|
||||
models: {
|
||||
providers: {
|
||||
google: {
|
||||
apiKey: "existing-google-key",
|
||||
},
|
||||
},
|
||||
},
|
||||
skills: {
|
||||
entries: {
|
||||
"nano-banana-pro": {
|
||||
apiKey: "legacy-gemini-key",
|
||||
},
|
||||
peekaboo: { enabled: true },
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(res.config.agents?.defaults?.imageGenerationModel).toEqual({
|
||||
primary: "fal/fal-ai/flux/dev",
|
||||
});
|
||||
expect(res.config.models?.providers?.google?.apiKey).toBe("existing-google-key");
|
||||
expect(res.config.skills?.entries).toEqual({
|
||||
peekaboo: { enabled: true },
|
||||
});
|
||||
expect(res.changes).toEqual(["Removed legacy skills.entries.nano-banana-pro."]);
|
||||
});
|
||||
|
||||
it("removes nano-banana from skills.allowBundled during migration", () => {
|
||||
const res = normalizeCompatibilityConfigValues({
|
||||
skills: {
|
||||
allowBundled: ["peekaboo", "nano-banana-pro"],
|
||||
},
|
||||
});
|
||||
|
||||
expect(res.config.skills?.allowBundled).toEqual(["peekaboo"]);
|
||||
expect(res.changes).toEqual(["Removed nano-banana-pro from skills.allowBundled."]);
|
||||
});
|
||||
});
|
||||
|
||||
@ -15,6 +15,8 @@ export function normalizeCompatibilityConfigValues(cfg: OpenClawConfig): {
|
||||
changes: string[];
|
||||
} {
|
||||
const changes: string[] = [];
|
||||
const NANO_BANANA_SKILL_KEY = "nano-banana-pro";
|
||||
const NANO_BANANA_MODEL = "google/gemini-3-pro-image-preview";
|
||||
let next: OpenClawConfig = cfg;
|
||||
|
||||
const isRecord = (value: unknown): value is Record<string, unknown> =>
|
||||
@ -471,7 +473,121 @@ export function normalizeCompatibilityConfigValues(cfg: OpenClawConfig): {
|
||||
);
|
||||
};
|
||||
|
||||
const normalizeLegacyNanoBananaSkill = () => {
|
||||
const rawSkills = next.skills;
|
||||
if (!isRecord(rawSkills)) {
|
||||
return;
|
||||
}
|
||||
|
||||
let skillsChanged = false;
|
||||
let skills = structuredClone(rawSkills);
|
||||
|
||||
if (Array.isArray(skills.allowBundled)) {
|
||||
const allowBundled = skills.allowBundled.filter(
|
||||
(value) => typeof value !== "string" || value.trim() !== NANO_BANANA_SKILL_KEY,
|
||||
);
|
||||
if (allowBundled.length !== skills.allowBundled.length) {
|
||||
if (allowBundled.length === 0) {
|
||||
delete skills.allowBundled;
|
||||
changes.push(`Removed skills.allowBundled entry for ${NANO_BANANA_SKILL_KEY}.`);
|
||||
} else {
|
||||
skills.allowBundled = allowBundled;
|
||||
changes.push(`Removed ${NANO_BANANA_SKILL_KEY} from skills.allowBundled.`);
|
||||
}
|
||||
skillsChanged = true;
|
||||
}
|
||||
}
|
||||
|
||||
const rawEntries = skills.entries;
|
||||
if (!isRecord(rawEntries)) {
|
||||
if (skillsChanged) {
|
||||
next = { ...next, skills };
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
const rawLegacyEntry = rawEntries[NANO_BANANA_SKILL_KEY];
|
||||
if (!isRecord(rawLegacyEntry)) {
|
||||
if (skillsChanged) {
|
||||
next = { ...next, skills };
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
const existingImageGenerationModel = next.agents?.defaults?.imageGenerationModel;
|
||||
if (existingImageGenerationModel === undefined) {
|
||||
next = {
|
||||
...next,
|
||||
agents: {
|
||||
...next.agents,
|
||||
defaults: {
|
||||
...next.agents?.defaults,
|
||||
imageGenerationModel: {
|
||||
primary: NANO_BANANA_MODEL,
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
||||
changes.push(
|
||||
`Moved skills.entries.${NANO_BANANA_SKILL_KEY} → agents.defaults.imageGenerationModel.primary (${NANO_BANANA_MODEL}).`,
|
||||
);
|
||||
}
|
||||
|
||||
const legacyEnv = isRecord(rawLegacyEntry.env) ? rawLegacyEntry.env : undefined;
|
||||
const legacyEnvApiKey =
|
||||
typeof legacyEnv?.GEMINI_API_KEY === "string" ? legacyEnv.GEMINI_API_KEY.trim() : "";
|
||||
const legacyApiKey =
|
||||
legacyEnvApiKey ||
|
||||
(typeof rawLegacyEntry.apiKey === "string"
|
||||
? rawLegacyEntry.apiKey.trim()
|
||||
: rawLegacyEntry.apiKey && isRecord(rawLegacyEntry.apiKey)
|
||||
? structuredClone(rawLegacyEntry.apiKey)
|
||||
: undefined);
|
||||
|
||||
const rawModels = isRecord(next.models) ? structuredClone(next.models) : {};
|
||||
const rawProviders = isRecord(rawModels.providers) ? { ...rawModels.providers } : {};
|
||||
const rawGoogle = isRecord(rawProviders.google) ? { ...rawProviders.google } : {};
|
||||
const hasGoogleApiKey = rawGoogle.apiKey !== undefined;
|
||||
if (!hasGoogleApiKey && legacyApiKey) {
|
||||
rawGoogle.apiKey = legacyApiKey;
|
||||
rawProviders.google = rawGoogle;
|
||||
rawModels.providers = rawProviders;
|
||||
next = {
|
||||
...next,
|
||||
models: rawModels as OpenClawConfig["models"],
|
||||
};
|
||||
changes.push(
|
||||
`Moved skills.entries.${NANO_BANANA_SKILL_KEY}.${legacyEnvApiKey ? "env.GEMINI_API_KEY" : "apiKey"} → models.providers.google.apiKey.`,
|
||||
);
|
||||
}
|
||||
|
||||
const entries = { ...rawEntries };
|
||||
delete entries[NANO_BANANA_SKILL_KEY];
|
||||
if (Object.keys(entries).length === 0) {
|
||||
delete skills.entries;
|
||||
changes.push(`Removed legacy skills.entries.${NANO_BANANA_SKILL_KEY}.`);
|
||||
} else {
|
||||
skills.entries = entries;
|
||||
changes.push(`Removed legacy skills.entries.${NANO_BANANA_SKILL_KEY}.`);
|
||||
}
|
||||
skillsChanged = true;
|
||||
|
||||
if (Object.keys(skills).length === 0) {
|
||||
const { skills: _ignored, ...rest } = next;
|
||||
next = rest;
|
||||
return;
|
||||
}
|
||||
|
||||
if (skillsChanged) {
|
||||
next = {
|
||||
...next,
|
||||
skills,
|
||||
};
|
||||
}
|
||||
};
|
||||
|
||||
normalizeBrowserSsrFPolicyAlias();
|
||||
normalizeLegacyNanoBananaSkill();
|
||||
|
||||
const legacyAckReaction = cfg.messages?.ackReaction?.trim();
|
||||
const hasWhatsAppConfig = cfg.channels?.whatsapp !== undefined;
|
||||
|
||||
@ -127,6 +127,97 @@ describe("fal image-generation provider", () => {
|
||||
);
|
||||
});
|
||||
|
||||
it("maps aspect ratio for text generation without forcing a square default", async () => {
|
||||
vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
|
||||
apiKey: "fal-test-key",
|
||||
source: "env",
|
||||
mode: "api-key",
|
||||
});
|
||||
const fetchMock = vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce({
|
||||
ok: true,
|
||||
json: async () => ({
|
||||
images: [{ url: "https://v3.fal.media/files/example/wide.png" }],
|
||||
}),
|
||||
})
|
||||
.mockResolvedValueOnce({
|
||||
ok: true,
|
||||
headers: new Headers({ "content-type": "image/png" }),
|
||||
arrayBuffer: async () => Buffer.from("wide-data"),
|
||||
});
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const provider = buildFalImageGenerationProvider();
|
||||
await provider.generateImage({
|
||||
provider: "fal",
|
||||
model: "fal-ai/flux/dev",
|
||||
prompt: "wide cinematic shot",
|
||||
cfg: {},
|
||||
aspectRatio: "16:9",
|
||||
});
|
||||
|
||||
expect(fetchMock).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
"https://fal.run/fal-ai/flux/dev",
|
||||
expect.objectContaining({
|
||||
method: "POST",
|
||||
body: JSON.stringify({
|
||||
prompt: "wide cinematic shot",
|
||||
image_size: "landscape_16_9",
|
||||
num_images: 1,
|
||||
output_format: "png",
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it("combines resolution and aspect ratio for text generation", async () => {
|
||||
vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
|
||||
apiKey: "fal-test-key",
|
||||
source: "env",
|
||||
mode: "api-key",
|
||||
});
|
||||
const fetchMock = vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce({
|
||||
ok: true,
|
||||
json: async () => ({
|
||||
images: [{ url: "https://v3.fal.media/files/example/portrait.png" }],
|
||||
}),
|
||||
})
|
||||
.mockResolvedValueOnce({
|
||||
ok: true,
|
||||
headers: new Headers({ "content-type": "image/png" }),
|
||||
arrayBuffer: async () => Buffer.from("portrait-data"),
|
||||
});
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const provider = buildFalImageGenerationProvider();
|
||||
await provider.generateImage({
|
||||
provider: "fal",
|
||||
model: "fal-ai/flux/dev",
|
||||
prompt: "portrait poster",
|
||||
cfg: {},
|
||||
resolution: "2K",
|
||||
aspectRatio: "9:16",
|
||||
});
|
||||
|
||||
expect(fetchMock).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
"https://fal.run/fal-ai/flux/dev",
|
||||
expect.objectContaining({
|
||||
method: "POST",
|
||||
body: JSON.stringify({
|
||||
prompt: "portrait poster",
|
||||
image_size: { width: 1152, height: 2048 },
|
||||
num_images: 1,
|
||||
output_format: "png",
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it("rejects multi-image edit requests for now", async () => {
|
||||
vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
|
||||
apiKey: "fal-test-key",
|
||||
@ -148,4 +239,24 @@ describe("fal image-generation provider", () => {
|
||||
}),
|
||||
).rejects.toThrow("at most one reference image");
|
||||
});
|
||||
|
||||
it("rejects aspect ratio overrides for the current edit endpoint", async () => {
|
||||
vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
|
||||
apiKey: "fal-test-key",
|
||||
source: "env",
|
||||
mode: "api-key",
|
||||
});
|
||||
|
||||
const provider = buildFalImageGenerationProvider();
|
||||
await expect(
|
||||
provider.generateImage({
|
||||
provider: "fal",
|
||||
model: "fal-ai/flux/dev",
|
||||
prompt: "make it widescreen",
|
||||
cfg: {},
|
||||
aspectRatio: "16:9",
|
||||
inputImages: [{ buffer: Buffer.from("one"), mimeType: "image/png" }],
|
||||
}),
|
||||
).rejects.toThrow("does not support aspectRatio overrides");
|
||||
});
|
||||
});
|
||||
|
||||
@ -5,8 +5,15 @@ import type { GeneratedImageAsset } from "../types.js";
|
||||
const DEFAULT_FAL_BASE_URL = "https://fal.run";
|
||||
const DEFAULT_FAL_IMAGE_MODEL = "fal-ai/flux/dev";
|
||||
const DEFAULT_FAL_EDIT_SUBPATH = "image-to-image";
|
||||
const DEFAULT_OUTPUT_SIZE = "square_hd";
|
||||
const DEFAULT_OUTPUT_FORMAT = "png";
|
||||
const FAL_SUPPORTED_SIZES = [
|
||||
"1024x1024",
|
||||
"1024x1536",
|
||||
"1536x1024",
|
||||
"1024x1792",
|
||||
"1792x1024",
|
||||
] as const;
|
||||
const FAL_SUPPORTED_ASPECT_RATIOS = ["1:1", "4:3", "3:4", "16:9", "9:16"] as const;
|
||||
|
||||
type FalGeneratedImage = {
|
||||
url?: string;
|
||||
@ -57,23 +64,85 @@ function parseSize(raw: string | undefined): { width: number; height: number } |
|
||||
return { width, height };
|
||||
}
|
||||
|
||||
function mapResolutionToSize(resolution: "1K" | "2K" | "4K" | undefined): FalImageSize | undefined {
|
||||
function mapResolutionToEdge(resolution: "1K" | "2K" | "4K" | undefined): number | undefined {
|
||||
if (!resolution) {
|
||||
return undefined;
|
||||
}
|
||||
const edge = resolution === "4K" ? 4096 : resolution === "2K" ? 2048 : 1024;
|
||||
return { width: edge, height: edge };
|
||||
return resolution === "4K" ? 4096 : resolution === "2K" ? 2048 : 1024;
|
||||
}
|
||||
|
||||
function aspectRatioToEnum(aspectRatio: string | undefined): string | undefined {
|
||||
const normalized = aspectRatio?.trim();
|
||||
if (!normalized) {
|
||||
return undefined;
|
||||
}
|
||||
if (normalized === "1:1") {
|
||||
return "square_hd";
|
||||
}
|
||||
if (normalized === "4:3") {
|
||||
return "landscape_4_3";
|
||||
}
|
||||
if (normalized === "3:4") {
|
||||
return "portrait_4_3";
|
||||
}
|
||||
if (normalized === "16:9") {
|
||||
return "landscape_16_9";
|
||||
}
|
||||
if (normalized === "9:16") {
|
||||
return "portrait_16_9";
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function aspectRatioToDimensions(aspectRatio: string, edge: number): { width: number; height: number } {
|
||||
const match = /^(\d+):(\d+)$/u.exec(aspectRatio.trim());
|
||||
if (!match) {
|
||||
throw new Error(`Invalid fal aspect ratio: ${aspectRatio}`);
|
||||
}
|
||||
const widthRatio = Number.parseInt(match[1] ?? "", 10);
|
||||
const heightRatio = Number.parseInt(match[2] ?? "", 10);
|
||||
if (!Number.isFinite(widthRatio) || !Number.isFinite(heightRatio) || widthRatio <= 0 || heightRatio <= 0) {
|
||||
throw new Error(`Invalid fal aspect ratio: ${aspectRatio}`);
|
||||
}
|
||||
if (widthRatio >= heightRatio) {
|
||||
return {
|
||||
width: edge,
|
||||
height: Math.max(1, Math.round((edge * heightRatio) / widthRatio)),
|
||||
};
|
||||
}
|
||||
return {
|
||||
width: Math.max(1, Math.round((edge * widthRatio) / heightRatio)),
|
||||
height: edge,
|
||||
};
|
||||
}
|
||||
|
||||
function resolveFalImageSize(params: {
|
||||
size?: string;
|
||||
resolution?: "1K" | "2K" | "4K";
|
||||
}): FalImageSize {
|
||||
aspectRatio?: string;
|
||||
hasInputImages: boolean;
|
||||
}): FalImageSize | undefined {
|
||||
const parsed = parseSize(params.size);
|
||||
if (parsed) {
|
||||
return parsed;
|
||||
}
|
||||
return mapResolutionToSize(params.resolution) ?? DEFAULT_OUTPUT_SIZE;
|
||||
|
||||
const normalizedAspectRatio = params.aspectRatio?.trim();
|
||||
if (normalizedAspectRatio && params.hasInputImages) {
|
||||
throw new Error("fal image edit endpoint does not support aspectRatio overrides");
|
||||
}
|
||||
|
||||
const edge = mapResolutionToEdge(params.resolution);
|
||||
if (normalizedAspectRatio && edge) {
|
||||
return aspectRatioToDimensions(normalizedAspectRatio, edge);
|
||||
}
|
||||
if (edge) {
|
||||
return { width: edge, height: edge };
|
||||
}
|
||||
if (normalizedAspectRatio) {
|
||||
return aspectRatioToEnum(normalizedAspectRatio) ?? aspectRatioToDimensions(normalizedAspectRatio, 1024);
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function toDataUri(buffer: Buffer, mimeType: string): string {
|
||||
@ -111,9 +180,27 @@ export function buildFalImageGenerationProvider(): ImageGenerationProviderPlugin
|
||||
label: "fal",
|
||||
defaultModel: DEFAULT_FAL_IMAGE_MODEL,
|
||||
models: [DEFAULT_FAL_IMAGE_MODEL, `${DEFAULT_FAL_IMAGE_MODEL}/${DEFAULT_FAL_EDIT_SUBPATH}`],
|
||||
supportedSizes: ["1024x1024", "1024x1536", "1536x1024", "1024x1792", "1792x1024"],
|
||||
supportedResolutions: ["1K", "2K", "4K"],
|
||||
supportsImageEditing: true,
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxCount: 4,
|
||||
maxInputImages: 1,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: true,
|
||||
},
|
||||
geometry: {
|
||||
sizes: [...FAL_SUPPORTED_SIZES],
|
||||
aspectRatios: [...FAL_SUPPORTED_ASPECT_RATIOS],
|
||||
resolutions: ["1K", "2K", "4K"],
|
||||
},
|
||||
},
|
||||
async generateImage(req) {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
provider: "fal",
|
||||
@ -128,18 +215,22 @@ export function buildFalImageGenerationProvider(): ImageGenerationProviderPlugin
|
||||
throw new Error("fal image generation currently supports at most one reference image");
|
||||
}
|
||||
|
||||
const hasInputImages = (req.inputImages?.length ?? 0) > 0;
|
||||
const imageSize = resolveFalImageSize({
|
||||
size: req.size,
|
||||
resolution: req.resolution,
|
||||
aspectRatio: req.aspectRatio,
|
||||
hasInputImages,
|
||||
});
|
||||
const hasInputImages = (req.inputImages?.length ?? 0) > 0;
|
||||
const model = ensureFalModelPath(req.model, hasInputImages);
|
||||
const requestBody: Record<string, unknown> = {
|
||||
prompt: req.prompt,
|
||||
image_size: imageSize,
|
||||
num_images: req.count ?? 1,
|
||||
output_format: DEFAULT_OUTPUT_FORMAT,
|
||||
};
|
||||
if (imageSize !== undefined) {
|
||||
requestBody.image_size = imageSize;
|
||||
}
|
||||
|
||||
if (hasInputImages) {
|
||||
const [input] = req.inputImages ?? [];
|
||||
|
||||
@ -197,7 +197,6 @@ describe("Google image-generation provider", () => {
|
||||
generationConfig: {
|
||||
responseModalities: ["TEXT", "IMAGE"],
|
||||
imageConfig: {
|
||||
aspectRatio: "1:1",
|
||||
imageSize: "4K",
|
||||
},
|
||||
},
|
||||
@ -205,4 +204,62 @@ describe("Google image-generation provider", () => {
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it("forwards explicit aspect ratio without forcing a default when size is omitted", async () => {
|
||||
vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
|
||||
apiKey: "google-test-key",
|
||||
source: "env",
|
||||
mode: "api-key",
|
||||
});
|
||||
const fetchMock = vi.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
json: async () => ({
|
||||
candidates: [
|
||||
{
|
||||
content: {
|
||||
parts: [
|
||||
{
|
||||
inlineData: {
|
||||
mimeType: "image/png",
|
||||
data: Buffer.from("png-data").toString("base64"),
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
],
|
||||
}),
|
||||
});
|
||||
vi.stubGlobal("fetch", fetchMock);
|
||||
|
||||
const provider = buildGoogleImageGenerationProvider();
|
||||
await provider.generateImage({
|
||||
provider: "google",
|
||||
model: "gemini-3-pro-image-preview",
|
||||
prompt: "portrait photo",
|
||||
cfg: {},
|
||||
aspectRatio: "9:16",
|
||||
});
|
||||
|
||||
expect(fetchMock).toHaveBeenCalledWith(
|
||||
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent",
|
||||
expect.objectContaining({
|
||||
method: "POST",
|
||||
body: JSON.stringify({
|
||||
contents: [
|
||||
{
|
||||
role: "user",
|
||||
parts: [{ text: "portrait photo" }],
|
||||
},
|
||||
],
|
||||
generationConfig: {
|
||||
responseModalities: ["TEXT", "IMAGE"],
|
||||
imageConfig: {
|
||||
aspectRatio: "9:16",
|
||||
},
|
||||
},
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
});
|
||||
|
||||
@ -11,7 +11,25 @@ import type { ImageGenerationProviderPlugin } from "../../plugins/types.js";
|
||||
const DEFAULT_GOOGLE_IMAGE_BASE_URL = "https://generativelanguage.googleapis.com/v1beta";
|
||||
const DEFAULT_GOOGLE_IMAGE_MODEL = "gemini-3.1-flash-image-preview";
|
||||
const DEFAULT_OUTPUT_MIME = "image/png";
|
||||
const DEFAULT_ASPECT_RATIO = "1:1";
|
||||
const GOOGLE_SUPPORTED_SIZES = [
|
||||
"1024x1024",
|
||||
"1024x1536",
|
||||
"1536x1024",
|
||||
"1024x1792",
|
||||
"1792x1024",
|
||||
] as const;
|
||||
const GOOGLE_SUPPORTED_ASPECT_RATIOS = [
|
||||
"1:1",
|
||||
"2:3",
|
||||
"3:2",
|
||||
"3:4",
|
||||
"4:3",
|
||||
"4:5",
|
||||
"5:4",
|
||||
"9:16",
|
||||
"16:9",
|
||||
"21:9",
|
||||
] as const;
|
||||
|
||||
type GoogleInlineDataPart = {
|
||||
mimeType?: string;
|
||||
@ -46,7 +64,7 @@ function mapSizeToImageConfig(
|
||||
): { aspectRatio?: string; imageSize?: "2K" | "4K" } | undefined {
|
||||
const trimmed = size?.trim();
|
||||
if (!trimmed) {
|
||||
return { aspectRatio: DEFAULT_ASPECT_RATIO };
|
||||
return undefined;
|
||||
}
|
||||
|
||||
const normalized = trimmed.toLowerCase();
|
||||
@ -81,8 +99,27 @@ export function buildGoogleImageGenerationProvider(): ImageGenerationProviderPlu
|
||||
label: "Google",
|
||||
defaultModel: DEFAULT_GOOGLE_IMAGE_MODEL,
|
||||
models: [DEFAULT_GOOGLE_IMAGE_MODEL, "gemini-3-pro-image-preview"],
|
||||
supportedResolutions: ["1K", "2K", "4K"],
|
||||
supportsImageEditing: true,
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxCount: 4,
|
||||
maxInputImages: 5,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: true,
|
||||
supportsResolution: true,
|
||||
},
|
||||
geometry: {
|
||||
sizes: [...GOOGLE_SUPPORTED_SIZES],
|
||||
aspectRatios: [...GOOGLE_SUPPORTED_ASPECT_RATIOS],
|
||||
resolutions: ["1K", "2K", "4K"],
|
||||
},
|
||||
},
|
||||
async generateImage(req) {
|
||||
const auth = await resolveApiKeyForProvider({
|
||||
provider: "google",
|
||||
@ -111,6 +148,7 @@ export function buildGoogleImageGenerationProvider(): ImageGenerationProviderPlu
|
||||
}));
|
||||
const resolvedImageConfig = {
|
||||
...imageConfig,
|
||||
...(req.aspectRatio?.trim() ? { aspectRatio: req.aspectRatio.trim() } : {}),
|
||||
...(req.resolution ? { imageSize: req.resolution } : {}),
|
||||
};
|
||||
|
||||
|
||||
@ -5,6 +5,7 @@ const DEFAULT_OPENAI_IMAGE_BASE_URL = "https://api.openai.com/v1";
|
||||
const DEFAULT_OPENAI_IMAGE_MODEL = "gpt-image-1";
|
||||
const DEFAULT_OUTPUT_MIME = "image/png";
|
||||
const DEFAULT_SIZE = "1024x1024";
|
||||
const OPENAI_SUPPORTED_SIZES = ["1024x1024", "1024x1536", "1536x1024"] as const;
|
||||
|
||||
type OpenAIImageApiResponse = {
|
||||
data?: Array<{
|
||||
@ -24,7 +25,25 @@ export function buildOpenAIImageGenerationProvider(): ImageGenerationProviderPlu
|
||||
label: "OpenAI",
|
||||
defaultModel: DEFAULT_OPENAI_IMAGE_MODEL,
|
||||
models: [DEFAULT_OPENAI_IMAGE_MODEL],
|
||||
supportedSizes: ["1024x1024", "1024x1536", "1536x1024"],
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxCount: 4,
|
||||
supportsSize: true,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: false,
|
||||
},
|
||||
edit: {
|
||||
enabled: false,
|
||||
maxCount: 0,
|
||||
maxInputImages: 0,
|
||||
supportsSize: false,
|
||||
supportsAspectRatio: false,
|
||||
supportsResolution: false,
|
||||
},
|
||||
geometry: {
|
||||
sizes: [...OPENAI_SUPPORTED_SIZES],
|
||||
},
|
||||
},
|
||||
async generateImage(req) {
|
||||
if ((req.inputImages?.length ?? 0) > 0) {
|
||||
throw new Error("OpenAI image generation provider does not support reference-image edits");
|
||||
|
||||
@ -19,6 +19,10 @@ describe("image-generation runtime helpers", () => {
|
||||
source: "test",
|
||||
provider: {
|
||||
id: "image-plugin",
|
||||
capabilities: {
|
||||
generate: {},
|
||||
edit: { enabled: false },
|
||||
},
|
||||
async generateImage(req) {
|
||||
seenAuthStore = req.authStore;
|
||||
return {
|
||||
@ -76,7 +80,18 @@ describe("image-generation runtime helpers", () => {
|
||||
id: "image-plugin",
|
||||
defaultModel: "img-v1",
|
||||
models: ["img-v1", "img-v2"],
|
||||
supportedResolutions: ["1K", "2K"],
|
||||
capabilities: {
|
||||
generate: {
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 3,
|
||||
},
|
||||
geometry: {
|
||||
resolutions: ["1K", "2K"],
|
||||
},
|
||||
},
|
||||
generateImage: async () => ({
|
||||
images: [{ buffer: Buffer.from("x"), mimeType: "image/png" }],
|
||||
}),
|
||||
@ -89,7 +104,18 @@ describe("image-generation runtime helpers", () => {
|
||||
id: "image-plugin",
|
||||
defaultModel: "img-v1",
|
||||
models: ["img-v1", "img-v2"],
|
||||
supportedResolutions: ["1K", "2K"],
|
||||
capabilities: {
|
||||
generate: {
|
||||
supportsResolution: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxInputImages: 3,
|
||||
},
|
||||
geometry: {
|
||||
resolutions: ["1K", "2K"],
|
||||
},
|
||||
},
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
@ -25,6 +25,7 @@ export type GenerateImageParams = {
|
||||
modelOverride?: string;
|
||||
count?: number;
|
||||
size?: string;
|
||||
aspectRatio?: string;
|
||||
resolution?: ImageGenerationResolution;
|
||||
inputImages?: ImageGenerationSourceImage[];
|
||||
};
|
||||
@ -142,6 +143,7 @@ export async function generateImage(
|
||||
authStore: params.authStore,
|
||||
count: params.count,
|
||||
size: params.size,
|
||||
aspectRatio: params.aspectRatio,
|
||||
resolution: params.resolution,
|
||||
inputImages: params.inputImages,
|
||||
});
|
||||
|
||||
@ -27,6 +27,7 @@ export type ImageGenerationRequest = {
|
||||
authStore?: AuthProfileStore;
|
||||
count?: number;
|
||||
size?: string;
|
||||
aspectRatio?: string;
|
||||
resolution?: ImageGenerationResolution;
|
||||
inputImages?: ImageGenerationSourceImage[];
|
||||
};
|
||||
@ -37,14 +38,36 @@ export type ImageGenerationResult = {
|
||||
metadata?: Record<string, unknown>;
|
||||
};
|
||||
|
||||
export type ImageGenerationModeCapabilities = {
|
||||
maxCount?: number;
|
||||
supportsSize?: boolean;
|
||||
supportsAspectRatio?: boolean;
|
||||
supportsResolution?: boolean;
|
||||
};
|
||||
|
||||
export type ImageGenerationEditCapabilities = ImageGenerationModeCapabilities & {
|
||||
enabled: boolean;
|
||||
maxInputImages?: number;
|
||||
};
|
||||
|
||||
export type ImageGenerationGeometryCapabilities = {
|
||||
sizes?: string[];
|
||||
aspectRatios?: string[];
|
||||
resolutions?: ImageGenerationResolution[];
|
||||
};
|
||||
|
||||
export type ImageGenerationProviderCapabilities = {
|
||||
generate: ImageGenerationModeCapabilities;
|
||||
edit: ImageGenerationEditCapabilities;
|
||||
geometry?: ImageGenerationGeometryCapabilities;
|
||||
};
|
||||
|
||||
export type ImageGenerationProvider = {
|
||||
id: string;
|
||||
aliases?: string[];
|
||||
label?: string;
|
||||
defaultModel?: string;
|
||||
models?: string[];
|
||||
supportedSizes?: string[];
|
||||
supportedResolutions?: ImageGenerationResolution[];
|
||||
supportsImageEditing?: boolean;
|
||||
capabilities: ImageGenerationProviderCapabilities;
|
||||
generateImage: (req: ImageGenerationRequest) => Promise<ImageGenerationResult>;
|
||||
};
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user