Image generation: native provider migration and explicit capabilities (#49551)

* Docs: retire nano-banana skill wrapper * Doctor: migrate nano-banana to native image generation * Image generation: align fal aspect ratio behavior * Image generation: make provider capabilities explicit
2026-03-18 00:04:03 -07:00 · 2026-03-18 00:04:03 -07:00 · 21c2ba480a
commit 21c2ba480a
parent 79f2173cd2
19 changed files with 1056 additions and 382 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -151,9 +151,12 @@ Docs: https://docs.openclaw.ai

 ### Breaking

+- Skills/image generation: remove the bundled `nano-banana-pro` skill wrapper. Use `agents.defaults.imageGenerationModel.primary: "google/gemini-3-pro-image-preview"` for the native Nano Banana-style path instead.
+
 - Browser/Chrome MCP: remove the legacy Chrome extension relay path, bundled extension assets, `driver: "extension"`, and `browser.relayBindHost`. Run `openclaw doctor --fix` to migrate host-local browser config to `existing-session` / `user`; Docker, headless, sandbox, and remote browser flows still use raw CDP. (#47893) Thanks @vincentkoc.
 - Plugins/runtime: remove the public `openclaw/extension-api` surface with no compatibility shim. Bundled plugins must use injected runtime for host-side operations (for example `api.runtime.agent.runEmbeddedPiAgent`) and any remaining direct imports must come from narrow `openclaw/plugin-sdk/*` subpaths instead of the monolithic SDK root.
 - Tools/image generation: standardize the stock image create/edit path on the core `image_generate` tool. The old `nano-banana-pro` docs/examples are gone; if you previously copied that sample-skill config, switch to `agents.defaults.imageGenerationModel` for built-in image generation or install a separate third-party skill explicitly.
+- Skills/image generation: remove the bundled `nano-banana-pro` skill wrapper. Use `agents.defaults.imageGenerationModel.primary: "google/gemini-3-pro-image-preview"` for the native Nano Banana-style path instead.
 - Plugins/message discovery: require `ChannelMessageActionAdapter.describeMessageTool(...)` for shared `message` tool discovery. The legacy `listActions`, `getCapabilities`, and `getToolSchema` adapter methods are removed. Plugin authors should migrate message discovery to `describeMessageTool(...)` and keep channel-specific action runtime code inside the owning plugin package. Thanks @gumadeiras.

 ## 2026.3.13
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@ -905,7 +905,9 @@ Time format in system prompt. Default: `auto` (OS preference).
  - Also used as fallback routing when the selected/default model cannot accept image input.
 - `imageGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
  - Used by the shared image-generation capability and any future tool/plugin surface that generates images.
+  - Typical values: `google/gemini-3-pro-image-preview` for the native Nano Banana-style flow, `fal/fal-ai/flux/dev` for fal, or `openai/gpt-image-1` for OpenAI Images.
  - If omitted, `image_generate` can still infer a best-effort provider default from compatible auth-backed image-generation providers.
+  - Typical values: `google/gemini-3-pro-image-preview`, `fal/fal-ai/flux/dev`, `openai/gpt-image-1`.
 - `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
  - Used by the `pdf` tool for model routing.
  - If omitted, the PDF tool falls back to `imageModel`, then to best-effort provider defaults.
--- a/docs/tools/index.md
+++ b/docs/tools/index.md
@ -421,9 +421,24 @@ Notes:
 - Use `action: "list"` to inspect registered providers, default models, supported model ids, sizes, resolutions, and edit support.
 - Returns local `MEDIA:<path>` lines so channels can deliver the generated files directly.
 - Uses the image-generation model directly (independent of the main chat model).
- Google-backed flows support reference-image edits plus explicit `1K|2K|4K` resolution hints.
+- Google-backed flows, including `google/gemini-3-pro-image-preview` for the native Nano Banana-style path, support reference-image edits plus explicit `1K|2K|4K` resolution hints.
 - When editing and `resolution` is omitted, OpenClaw infers a draft/final resolution from the input image size.
- This is the built-in replacement for the old sample `nano-banana-pro` skill workflow. Use `agents.defaults.imageGenerationModel`, not `skills.entries`, for stock image generation.
+- This is the built-in replacement for the old `nano-banana-pro` skill workflow. Use `agents.defaults.imageGenerationModel`, not `skills.entries`, for stock image generation.
+
+Native example:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "google/gemini-3-pro-image-preview", // native Nano Banana path
+        fallbacks: ["fal/fal-ai/flux/dev"],
+      },
+    },
+  },
+}
+```

 ### `pdf`

--- a/docs/tools/skills-config.md
+++ b/docs/tools/skills-config.md
@ -42,6 +42,11 @@ For built-in image generation/editing, prefer `agents.defaults.imageGenerationMo
 plus the core `image_generate` tool. `skills.entries.*` is only for custom or
 third-party skill workflows.

+Examples:
+
+- Native Nano Banana-style setup: `agents.defaults.imageGenerationModel.primary: "google/gemini-3-pro-image-preview"`
+- Native fal setup: `agents.defaults.imageGenerationModel.primary: "fal/fal-ai/flux/dev"`
+
 ## Fields

 - `allowBundled`: optional allowlist for **bundled** skills only. When set, only
--- a/skills/nano-banana-pro/SKILL.md
+++ b/skills/nano-banana-pro/SKILL.md
@ -1,65 +0,0 @@
---
-name: nano-banana-pro
-description: Generate or edit images via Gemini 3 Pro Image (Nano Banana Pro).
-homepage: https://ai.google.dev/
-metadata:
-  {
-    "openclaw":
-      {
-        "emoji": "🍌",
-        "requires": { "bins": ["uv"], "env": ["GEMINI_API_KEY"] },
-        "primaryEnv": "GEMINI_API_KEY",
-        "install":
-          [
-            {
-              "id": "uv-brew",
-              "kind": "brew",
-              "formula": "uv",
-              "bins": ["uv"],
-              "label": "Install uv (brew)",
-            },
-          ],
-      },
-  }
---
-
-# Nano Banana Pro (Gemini 3 Pro Image)
-
-Use the bundled script to generate or edit images.
-
-Generate
-
-```bash
-uv run {baseDir}/scripts/generate_image.py --prompt "your image description" --filename "output.png" --resolution 1K
-```
-
-Edit (single image)
-
-```bash
-uv run {baseDir}/scripts/generate_image.py --prompt "edit instructions" --filename "output.png" -i "/path/in.png" --resolution 2K
-```
-
-Multi-image composition (up to 14 images)
-
-```bash
-uv run {baseDir}/scripts/generate_image.py --prompt "combine these into one scene" --filename "output.png" -i img1.png -i img2.png -i img3.png
-```
-
-API key
-
- `GEMINI_API_KEY` env var
- Or set `skills."nano-banana-pro".apiKey` / `skills."nano-banana-pro".env.GEMINI_API_KEY` in `~/.openclaw/openclaw.json`
-
-Specific aspect ratio (optional)
-
-```bash
-uv run {baseDir}/scripts/generate_image.py --prompt "portrait photo" --filename "output.png" --aspect-ratio 9:16
-```
-
-Notes
-
- Resolutions: `1K` (default), `2K`, `4K`.
- Aspect ratios: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`. Without `--aspect-ratio` / `-a`, the model picks freely - use this flag for avatars, profile pics, or consistent batch generation.
- Use timestamps in filenames: `yyyy-mm-dd-hh-mm-ss-name.png`.
- The script prints a `MEDIA:` line for OpenClaw to auto-attach on supported chat providers.
- Do not read the image back; report the saved path only.
--- a/skills/nano-banana-pro/scripts/generate_image.py
+++ b/skills/nano-banana-pro/scripts/generate_image.py
@ -1,235 +0,0 @@
-#!/usr/bin/env python3
-# /// script
-# requires-python = ">=3.10"
-# dependencies = [
-#     "google-genai>=1.0.0",
-#     "pillow>=10.0.0",
-# ]
-# ///
-"""
-Generate images using Google's Nano Banana Pro (Gemini 3 Pro Image) API.
-
-Usage:
-    uv run generate_image.py --prompt "your image description" --filename "output.png" [--resolution 1K|2K|4K] [--api-key KEY]
-
-Multi-image editing (up to 14 images):
-    uv run generate_image.py --prompt "combine these images" --filename "output.png" -i img1.png -i img2.png -i img3.png
-"""
-
-import argparse
-import os
-import sys
-from pathlib import Path
-
-SUPPORTED_ASPECT_RATIOS = [
-    "1:1",
-    "2:3",
-    "3:2",
-    "3:4",
-    "4:3",
-    "4:5",
-    "5:4",
-    "9:16",
-    "16:9",
-    "21:9",
-]
-
-
-def get_api_key(provided_key: str | None) -> str | None:
-    """Get API key from argument first, then environment."""
-    if provided_key:
-        return provided_key
-    return os.environ.get("GEMINI_API_KEY")
-
-
-def auto_detect_resolution(max_input_dim: int) -> str:
-    """Infer output resolution from the largest input image dimension."""
-    if max_input_dim >= 3000:
-        return "4K"
-    if max_input_dim >= 1500:
-        return "2K"
-    return "1K"
-
-
-def choose_output_resolution(
-    requested_resolution: str | None,
-    max_input_dim: int,
-    has_input_images: bool,
-) -> tuple[str, bool]:
-    """Choose final resolution and whether it was auto-detected.
-
-    Auto-detection is only applied when the user did not pass --resolution.
-    """
-    if requested_resolution is not None:
-        return requested_resolution, False
-
-    if has_input_images and max_input_dim > 0:
-        return auto_detect_resolution(max_input_dim), True
-
-    return "1K", False
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Generate images using Nano Banana Pro (Gemini 3 Pro Image)"
-    )
-    parser.add_argument(
-        "--prompt", "-p",
-        required=True,
-        help="Image description/prompt"
-    )
-    parser.add_argument(
-        "--filename", "-f",
-        required=True,
-        help="Output filename (e.g., sunset-mountains.png)"
-    )
-    parser.add_argument(
-        "--input-image", "-i",
-        action="append",
-        dest="input_images",
-        metavar="IMAGE",
-        help="Input image path(s) for editing/composition. Can be specified multiple times (up to 14 images)."
-    )
-    parser.add_argument(
-        "--resolution", "-r",
-        choices=["1K", "2K", "4K"],
-        default=None,
-        help="Output resolution: 1K, 2K, or 4K. If omitted with input images, auto-detect from largest image dimension."
-    )
-    parser.add_argument(
-        "--aspect-ratio", "-a",
-        choices=SUPPORTED_ASPECT_RATIOS,
-        default=None,
-        help=f"Output aspect ratio (default: model decides). Options: {', '.join(SUPPORTED_ASPECT_RATIOS)}"
-    )
-    parser.add_argument(
-        "--api-key", "-k",
-        help="Gemini API key (overrides GEMINI_API_KEY env var)"
-    )
-
-    args = parser.parse_args()
-
-    # Get API key
-    api_key = get_api_key(args.api_key)
-    if not api_key:
-        print("Error: No API key provided.", file=sys.stderr)
-        print("Please either:", file=sys.stderr)
-        print("  1. Provide --api-key argument", file=sys.stderr)
-        print("  2. Set GEMINI_API_KEY environment variable", file=sys.stderr)
-        sys.exit(1)
-
-    # Import here after checking API key to avoid slow import on error
-    from google import genai
-    from google.genai import types
-    from PIL import Image as PILImage
-
-    # Initialise client
-    client = genai.Client(api_key=api_key)
-
-    # Set up output path
-    output_path = Path(args.filename)
-    output_path.parent.mkdir(parents=True, exist_ok=True)
-
-    # Load input images if provided (up to 14 supported by Nano Banana Pro)
-    input_images = []
-    max_input_dim = 0
-    if args.input_images:
-        if len(args.input_images) > 14:
-            print(f"Error: Too many input images ({len(args.input_images)}). Maximum is 14.", file=sys.stderr)
-            sys.exit(1)
-
-        for img_path in args.input_images:
-            try:
-                with PILImage.open(img_path) as img:
-                    copied = img.copy()
-                    width, height = copied.size
-                input_images.append(copied)
-                print(f"Loaded input image: {img_path}")
-
-                # Track largest dimension for auto-resolution
-                max_input_dim = max(max_input_dim, width, height)
-            except Exception as e:
-                print(f"Error loading input image '{img_path}': {e}", file=sys.stderr)
-                sys.exit(1)
-
-    output_resolution, auto_detected = choose_output_resolution(
-        requested_resolution=args.resolution,
-        max_input_dim=max_input_dim,
-        has_input_images=bool(input_images),
-    )
-    if auto_detected:
-        print(
-            f"Auto-detected resolution: {output_resolution} "
-            f"(from max input dimension {max_input_dim})"
-        )
-
-    # Build contents (images first if editing, prompt only if generating)
-    if input_images:
-        contents = [*input_images, args.prompt]
-        img_count = len(input_images)
-        print(f"Processing {img_count} image{'s' if img_count > 1 else ''} with resolution {output_resolution}...")
-    else:
-        contents = args.prompt
-        print(f"Generating image with resolution {output_resolution}...")
-
-    try:
-        # Build image config with optional aspect ratio
-        image_cfg_kwargs = {"image_size": output_resolution}
-        if args.aspect_ratio:
-            image_cfg_kwargs["aspect_ratio"] = args.aspect_ratio
-
-        response = client.models.generate_content(
-            model="gemini-3-pro-image-preview",
-            contents=contents,
-            config=types.GenerateContentConfig(
-                response_modalities=["TEXT", "IMAGE"],
-                image_config=types.ImageConfig(**image_cfg_kwargs)
-            )
-        )
-
-        # Process response and convert to PNG
-        image_saved = False
-        for part in response.parts:
-            if part.text is not None:
-                print(f"Model response: {part.text}")
-            elif part.inline_data is not None:
-                # Convert inline data to PIL Image and save as PNG
-                from io import BytesIO
-
-                # inline_data.data is already bytes, not base64
-                image_data = part.inline_data.data
-                if isinstance(image_data, str):
-                    # If it's a string, it might be base64
-                    import base64
-                    image_data = base64.b64decode(image_data)
-
-                image = PILImage.open(BytesIO(image_data))
-
-                # Ensure RGB mode for PNG (convert RGBA to RGB with white background if needed)
-                if image.mode == 'RGBA':
-                    rgb_image = PILImage.new('RGB', image.size, (255, 255, 255))
-                    rgb_image.paste(image, mask=image.split()[3])
-                    rgb_image.save(str(output_path), 'PNG')
-                elif image.mode == 'RGB':
-                    image.save(str(output_path), 'PNG')
-                else:
-                    image.convert('RGB').save(str(output_path), 'PNG')
-                image_saved = True
-
-        if image_saved:
-            full_path = output_path.resolve()
-            print(f"\nImage saved: {full_path}")
-            # OpenClaw parses MEDIA: tokens and will attach the file on
-            # supported chat providers. Emit the canonical MEDIA:<path> form.
-            print(f"MEDIA:{full_path}")
-        else:
-            print("Error: No image was generated in the response.", file=sys.stderr)
-            sys.exit(1)
-
-    except Exception as e:
-        print(f"Error generating image: {e}", file=sys.stderr)
-        sys.exit(1)
-
-
-if __name__ == "__main__":
-    main()
--- a/skills/nano-banana-pro/scripts/test_generate_image.py
+++ b/skills/nano-banana-pro/scripts/test_generate_image.py
@ -1,36 +0,0 @@
-import importlib.util
-from pathlib import Path
-
-import pytest
-
-MODULE_PATH = Path(__file__).with_name("generate_image.py")
-SPEC = importlib.util.spec_from_file_location("generate_image", MODULE_PATH)
-assert SPEC and SPEC.loader
-MODULE = importlib.util.module_from_spec(SPEC)
-SPEC.loader.exec_module(MODULE)
-
-
-@pytest.mark.parametrize(
-    ("max_input_dim", "expected"),
-    [
-        (0, "1K"),
-        (1499, "1K"),
-        (1500, "2K"),
-        (2999, "2K"),
-        (3000, "4K"),
-    ],
-)
-def test_auto_detect_resolution_thresholds(max_input_dim, expected):
-    assert MODULE.auto_detect_resolution(max_input_dim) == expected
-
-
-def test_choose_output_resolution_auto_detects_when_resolution_omitted():
-    assert MODULE.choose_output_resolution(None, 2200, True) == ("2K", True)
-
-
-def test_choose_output_resolution_defaults_to_1k_without_inputs():
-    assert MODULE.choose_output_resolution(None, 0, False) == ("1K", False)
-
-
-def test_choose_output_resolution_respects_explicit_1k_with_large_input():
-    assert MODULE.choose_output_resolution("1K", 3500, True) == ("1K", False)
--- a/src/agents/tools/image-generate-tool.test.ts
+++ b/src/agents/tools/image-generate-tool.test.ts
@ -14,8 +14,23 @@ function stubImageGenerationProviders() {
      id: "google",
      defaultModel: "gemini-3.1-flash-image-preview",
      models: ["gemini-3.1-flash-image-preview", "gemini-3-pro-image-preview"],
-      supportedResolutions: ["1K", "2K", "4K"],
-      supportsImageEditing: true,
+      capabilities: {
+        generate: {
+          maxCount: 4,
+          supportsAspectRatio: true,
+          supportsResolution: true,
+        },
+        edit: {
+          enabled: true,
+          maxInputImages: 5,
+          supportsAspectRatio: true,
+          supportsResolution: true,
+        },
+        geometry: {
+          resolutions: ["1K", "2K", "4K"],
+          aspectRatios: ["1:1", "16:9"],
+        },
+      },
      generateImage: vi.fn(async () => {
        throw new Error("not used");
      }),
@ -24,8 +39,19 @@ function stubImageGenerationProviders() {
      id: "openai",
      defaultModel: "gpt-image-1",
      models: ["gpt-image-1"],
-      supportedSizes: ["1024x1024", "1024x1536", "1536x1024"],
-      supportsImageEditing: false,
+      capabilities: {
+        generate: {
+          maxCount: 4,
+          supportsSize: true,
+        },
+        edit: {
+          enabled: false,
+          maxInputImages: 0,
+        },
+        geometry: {
+          sizes: ["1024x1024", "1024x1536", "1536x1024"],
+        },
+      },
      generateImage: vi.fn(async () => {
        throw new Error("not used");
      }),
@ -138,6 +164,7 @@ describe("createImageGenerateTool", () => {
    const result = await tool.execute("call-1", {
      prompt: "A cat wearing sunglasses",
      model: "openai/gpt-image-1",
+      filename: "cats/output.png",
      count: 2,
      size: "1024x1024",
    });
@ -167,7 +194,7 @@ describe("createImageGenerateTool", () => {
      "image/png",
      "tool-image-generation",
      undefined,
-      "cat-one.png",
+      "cats/output.png",
    );
    expect(saveMediaBuffer).toHaveBeenNthCalledWith(
      2,
@ -175,7 +202,7 @@ describe("createImageGenerateTool", () => {
      "image/png",
      "tool-image-generation",
      undefined,
-      "cat-two.png",
+      "cats/output.png",
    );
    expect(result).toMatchObject({
      content: [
@ -189,6 +216,7 @@ describe("createImageGenerateTool", () => {
        model: "gpt-image-1",
        count: 2,
        paths: ["/tmp/generated-1.png", "/tmp/generated-2.png"],
+        filename: "cats/output.png",
        revisedPrompts: ["A more cinematic cat"],
      },
    });
@ -273,6 +301,7 @@ describe("createImageGenerateTool", () => {

    expect(generateImage).toHaveBeenCalledWith(
      expect.objectContaining({
+        aspectRatio: undefined,
        resolution: "4K",
        inputImages: [
          expect.objectContaining({
@ -284,6 +313,91 @@ describe("createImageGenerateTool", () => {
    );
  });

+  it("forwards explicit aspect ratio and supports up to 5 reference images", async () => {
+    const generateImage = vi.spyOn(imageGenerationRuntime, "generateImage").mockResolvedValue({
+      provider: "google",
+      model: "gemini-3-pro-image-preview",
+      attempts: [],
+      images: [
+        {
+          buffer: Buffer.from("png-out"),
+          mimeType: "image/png",
+          fileName: "edited.png",
+        },
+      ],
+    });
+    vi.spyOn(webMedia, "loadWebMedia").mockResolvedValue({
+      kind: "image",
+      buffer: Buffer.from("input-image"),
+      contentType: "image/png",
+    });
+    vi.spyOn(mediaStore, "saveMediaBuffer").mockResolvedValue({
+      path: "/tmp/edited.png",
+      id: "edited.png",
+      size: 7,
+      contentType: "image/png",
+    });
+
+    const tool = createImageGenerateTool({
+      config: {
+        agents: {
+          defaults: {
+            imageGenerationModel: {
+              primary: "google/gemini-3-pro-image-preview",
+            },
+          },
+        },
+      },
+      workspaceDir: process.cwd(),
+    });
+
+    expect(tool).not.toBeNull();
+    if (!tool) {
+      throw new Error("expected image_generate tool");
+    }
+
+    const images = Array.from({ length: 5 }, (_, index) => `./fixtures/ref-${index + 1}.png`);
+    await tool.execute("call-compose", {
+      prompt: "Combine these into one scene",
+      images,
+      aspectRatio: "16:9",
+    });
+
+    expect(generateImage).toHaveBeenCalledWith(
+      expect.objectContaining({
+        aspectRatio: "16:9",
+        inputImages: expect.arrayContaining([
+          expect.objectContaining({ buffer: Buffer.from("input-image"), mimeType: "image/png" }),
+        ]),
+      }),
+    );
+    expect(generateImage.mock.calls[0]?.[0].inputImages).toHaveLength(5);
+  });
+
+  it("rejects unsupported aspect ratios", async () => {
+    const tool = createImageGenerateTool({
+      config: {
+        agents: {
+          defaults: {
+            imageGenerationModel: {
+              primary: "google/gemini-3-pro-image-preview",
+            },
+          },
+        },
+      },
+    });
+
+    expect(tool).not.toBeNull();
+    if (!tool) {
+      throw new Error("expected image_generate tool");
+    }
+
+    await expect(tool.execute("call-bad-aspect", { prompt: "portrait", aspectRatio: "7:5" }))
+      .rejects.toThrow(
+        "aspectRatio must be one of 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, or 21:9",
+      );
+  });
+
  it("lists registered provider and model options", async () => {
    stubImageGenerationProviders();

@ -310,7 +424,8 @@ describe("createImageGenerateTool", () => {
    expect(text).toContain("google (default gemini-3.1-flash-image-preview)");
    expect(text).toContain("gemini-3.1-flash-image-preview");
    expect(text).toContain("gemini-3-pro-image-preview");
-    expect(text).toContain("editing");
+    expect(text).toContain("editing up to 5 refs");
+    expect(text).toContain("aspect ratios 1:1, 16:9");
    expect(result).toMatchObject({
      details: {
        providers: expect.arrayContaining([
@ -321,9 +436,139 @@ describe("createImageGenerateTool", () => {
              "gemini-3.1-flash-image-preview",
              "gemini-3-pro-image-preview",
            ]),
+            capabilities: expect.objectContaining({
+              edit: expect.objectContaining({
+                enabled: true,
+                maxInputImages: 5,
+              }),
+            }),
          }),
        ]),
      },
    });
  });
+
+  it("rejects provider-specific edit limits before runtime", async () => {
+    vi.spyOn(imageGenerationRuntime, "listRuntimeImageGenerationProviders").mockReturnValue([
+      {
+        id: "fal",
+        defaultModel: "fal-ai/flux/dev",
+        models: ["fal-ai/flux/dev", "fal-ai/flux/dev/image-to-image"],
+        capabilities: {
+          generate: {
+            maxCount: 4,
+            supportsSize: true,
+            supportsAspectRatio: true,
+            supportsResolution: true,
+          },
+          edit: {
+            enabled: true,
+            maxInputImages: 1,
+            supportsSize: true,
+            supportsAspectRatio: false,
+            supportsResolution: true,
+          },
+        },
+        generateImage: vi.fn(async () => {
+          throw new Error("not used");
+        }),
+      },
+    ]);
+    const generateImage = vi.spyOn(imageGenerationRuntime, "generateImage");
+    vi.spyOn(webMedia, "loadWebMedia").mockResolvedValue({
+      kind: "image",
+      buffer: Buffer.from("input-image"),
+      contentType: "image/png",
+    });
+
+    const tool = createImageGenerateTool({
+      config: {
+        agents: {
+          defaults: {
+            imageGenerationModel: {
+              primary: "fal/fal-ai/flux/dev",
+            },
+          },
+        },
+      },
+      workspaceDir: process.cwd(),
+    });
+
+    expect(tool).not.toBeNull();
+    if (!tool) {
+      throw new Error("expected image_generate tool");
+    }
+
+    await expect(
+      tool.execute("call-fal-edit", {
+        prompt: "combine",
+        images: ["./fixtures/a.png", "./fixtures/b.png"],
+      }),
+    ).rejects.toThrow("fal edit supports at most 1 reference image");
+    expect(generateImage).not.toHaveBeenCalled();
+  });
+
+  it("rejects unsupported provider-specific edit aspect ratio overrides before runtime", async () => {
+    vi.spyOn(imageGenerationRuntime, "listRuntimeImageGenerationProviders").mockReturnValue([
+      {
+        id: "fal",
+        defaultModel: "fal-ai/flux/dev",
+        models: ["fal-ai/flux/dev", "fal-ai/flux/dev/image-to-image"],
+        capabilities: {
+          generate: {
+            maxCount: 4,
+            supportsSize: true,
+            supportsAspectRatio: true,
+            supportsResolution: true,
+          },
+          edit: {
+            enabled: true,
+            maxInputImages: 1,
+            supportsSize: true,
+            supportsAspectRatio: false,
+            supportsResolution: true,
+          },
+          geometry: {
+            aspectRatios: ["1:1", "16:9"],
+          },
+        },
+        generateImage: vi.fn(async () => {
+          throw new Error("not used");
+        }),
+      },
+    ]);
+    const generateImage = vi.spyOn(imageGenerationRuntime, "generateImage");
+    vi.spyOn(webMedia, "loadWebMedia").mockResolvedValue({
+      kind: "image",
+      buffer: Buffer.from("input-image"),
+      contentType: "image/png",
+    });
+
+    const tool = createImageGenerateTool({
+      config: {
+        agents: {
+          defaults: {
+            imageGenerationModel: {
+              primary: "fal/fal-ai/flux/dev",
+            },
+          },
+        },
+      },
+      workspaceDir: process.cwd(),
+    });
+
+    expect(tool).not.toBeNull();
+    if (!tool) {
+      throw new Error("expected image_generate tool");
+    }
+
+    await expect(
+      tool.execute("call-fal-aspect", {
+        prompt: "edit",
+        image: "./fixtures/a.png",
+        aspectRatio: "16:9",
+      }),
+    ).rejects.toThrow("fal edit does not support aspectRatio overrides");
+    expect(generateImage).not.toHaveBeenCalled();
+  });
 });
--- a/src/agents/tools/image-generate-tool.ts
+++ b/src/agents/tools/image-generate-tool.ts
@ -6,6 +6,7 @@ import {
  listRuntimeImageGenerationProviders,
 } from "../../image-generation/runtime.js";
 import type {
+  ImageGenerationProvider,
  ImageGenerationResolution,
  ImageGenerationSourceImage,
 } from "../../image-generation/types.js";
@ -36,8 +37,20 @@ import {

 const DEFAULT_COUNT = 1;
 const MAX_COUNT = 4;
-const MAX_INPUT_IMAGES = 4;
+const MAX_INPUT_IMAGES = 5;
 const DEFAULT_RESOLUTION: ImageGenerationResolution = "1K";
+const SUPPORTED_ASPECT_RATIOS = new Set([
+  "1:1",
+  "2:3",
+  "3:2",
+  "3:4",
+  "4:3",
+  "4:5",
+  "5:4",
+  "9:16",
+  "16:9",
+  "21:9",
+]);

 const ImageGenerateToolSchema = Type.Object({
  action: Type.Optional(
@ -60,12 +73,24 @@ const ImageGenerateToolSchema = Type.Object({
  model: Type.Optional(
    Type.String({ description: "Optional provider/model override, e.g. openai/gpt-image-1." }),
  ),
+  filename: Type.Optional(
+    Type.String({
+      description:
+        "Optional output filename hint. OpenClaw preserves the basename and saves under its managed media directory.",
+    }),
+  ),
  size: Type.Optional(
    Type.String({
      description:
        "Optional size hint like 1024x1024, 1536x1024, 1024x1536, 1024x1792, or 1792x1024.",
    }),
  ),
+  aspectRatio: Type.Optional(
+    Type.String({
+      description:
+        "Optional aspect ratio hint: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, or 21:9.",
+    }),
+  ),
  resolution: Type.Optional(
    Type.String({
      description:
@ -162,6 +187,19 @@ function normalizeResolution(raw: string | undefined): ImageGenerationResolution
  throw new ToolInputError("resolution must be one of 1K, 2K, or 4K");
 }

+function normalizeAspectRatio(raw: string | undefined): string | undefined {
+  const normalized = raw?.trim();
+  if (!normalized) {
+    return undefined;
+  }
+  if (SUPPORTED_ASPECT_RATIOS.has(normalized)) {
+    return normalized;
+  }
+  throw new ToolInputError(
+    "aspectRatio must be one of 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, or 21:9",
+  );
+}
+
 function normalizeReferenceImages(args: Record<string, unknown>): string[] {
  const imageCandidates: string[] = [];
  if (typeof args.image === "string") {
@ -192,6 +230,112 @@ function normalizeReferenceImages(args: Record<string, unknown>): string[] {
  return normalized;
 }

+function parseImageGenerationModelRef(raw: string | undefined): { provider: string; model: string } | null {
+  const trimmed = raw?.trim();
+  if (!trimmed) {
+    return null;
+  }
+  const slashIndex = trimmed.indexOf("/");
+  if (slashIndex <= 0 || slashIndex === trimmed.length - 1) {
+    return null;
+  }
+  return {
+    provider: trimmed.slice(0, slashIndex).trim(),
+    model: trimmed.slice(slashIndex + 1).trim(),
+  };
+}
+
+function resolveSelectedImageGenerationProvider(params: {
+  config?: OpenClawConfig;
+  imageGenerationModelConfig: ToolModelConfig;
+  modelOverride?: string;
+}): ImageGenerationProvider | undefined {
+  const selectedRef =
+    parseImageGenerationModelRef(params.modelOverride) ??
+    parseImageGenerationModelRef(params.imageGenerationModelConfig.primary);
+  if (!selectedRef) {
+    return undefined;
+  }
+  return listRuntimeImageGenerationProviders({ config: params.config }).find(
+    (provider) =>
+      provider.id === selectedRef.provider || (provider.aliases ?? []).includes(selectedRef.provider),
+  );
+}
+
+function validateImageGenerationCapabilities(params: {
+  provider: ImageGenerationProvider | undefined;
+  count: number;
+  inputImageCount: number;
+  size?: string;
+  aspectRatio?: string;
+  resolution?: ImageGenerationResolution;
+}) {
+  const provider = params.provider;
+  if (!provider) {
+    return;
+  }
+  const isEdit = params.inputImageCount > 0;
+  const modeCaps = isEdit ? provider.capabilities.edit : provider.capabilities.generate;
+  const geometry = provider.capabilities.geometry;
+  const maxCount = modeCaps.maxCount ?? MAX_COUNT;
+  if (params.count > maxCount) {
+    throw new ToolInputError(
+      `${provider.id} ${isEdit ? "edit" : "generate"} supports at most ${maxCount} output image${maxCount === 1 ? "" : "s"}.`,
+    );
+  }
+
+  if (isEdit) {
+    if (!provider.capabilities.edit.enabled) {
+      throw new ToolInputError(`${provider.id} does not support reference-image edits.`);
+    }
+    const maxInputImages = provider.capabilities.edit.maxInputImages ?? MAX_INPUT_IMAGES;
+    if (params.inputImageCount > maxInputImages) {
+      throw new ToolInputError(
+        `${provider.id} edit supports at most ${maxInputImages} reference image${maxInputImages === 1 ? "" : "s"}.`,
+      );
+    }
+  }
+
+  if (params.size) {
+    if (!modeCaps.supportsSize) {
+      throw new ToolInputError(`${provider.id} ${isEdit ? "edit" : "generate"} does not support size overrides.`);
+    }
+    if ((geometry?.sizes?.length ?? 0) > 0 && !geometry?.sizes?.includes(params.size)) {
+      throw new ToolInputError(
+        `${provider.id} ${isEdit ? "edit" : "generate"} size must be one of ${geometry?.sizes?.join(", ")}.`,
+      );
+    }
+  }
+
+  if (params.aspectRatio) {
+    if (!modeCaps.supportsAspectRatio) {
+      throw new ToolInputError(`${provider.id} ${isEdit ? "edit" : "generate"} does not support aspectRatio overrides.`);
+    }
+    if (
+      (geometry?.aspectRatios?.length ?? 0) > 0 &&
+      !geometry?.aspectRatios?.includes(params.aspectRatio)
+    ) {
+      throw new ToolInputError(
+        `${provider.id} ${isEdit ? "edit" : "generate"} aspectRatio must be one of ${geometry?.aspectRatios?.join(", ")}.`,
+      );
+    }
+  }
+
+  if (params.resolution) {
+    if (!modeCaps.supportsResolution) {
+      throw new ToolInputError(`${provider.id} ${isEdit ? "edit" : "generate"} does not support resolution overrides.`);
+    }
+    if (
+      (geometry?.resolutions?.length ?? 0) > 0 &&
+      !geometry?.resolutions?.includes(params.resolution)
+    ) {
+      throw new ToolInputError(
+        `${provider.id} ${isEdit ? "edit" : "generate"} resolution must be one of ${geometry?.resolutions?.join("/")}.`,
+      );
+    }
+  }
+}
+
 type ImageGenerateSandboxConfig = {
  root: string;
  bridge: SandboxFsBridge;
@ -357,25 +501,25 @@ export function createImageGenerateTool(options?: {
            ...(provider.label ? { label: provider.label } : {}),
            ...(provider.defaultModel ? { defaultModel: provider.defaultModel } : {}),
            models: provider.models ?? (provider.defaultModel ? [provider.defaultModel] : []),
-            ...(provider.supportedSizes ? { supportedSizes: [...provider.supportedSizes] } : {}),
-            ...(provider.supportedResolutions
-              ? { supportedResolutions: [...provider.supportedResolutions] }
-              : {}),
-            ...(typeof provider.supportsImageEditing === "boolean"
-              ? { supportsImageEditing: provider.supportsImageEditing }
-              : {}),
+            capabilities: provider.capabilities,
          }),
        );
        const lines = providers.flatMap((provider) => {
          const caps: string[] = [];
-          if (provider.supportsImageEditing) {
-            caps.push("editing");
+          if (provider.capabilities.edit.enabled) {
+            const maxRefs = provider.capabilities.edit.maxInputImages;
+            caps.push(
+              `editing${typeof maxRefs === "number" ? ` up to ${maxRefs} ref${maxRefs === 1 ? "" : "s"}` : ""}`,
+            );
          }
-          if ((provider.supportedResolutions?.length ?? 0) > 0) {
-            caps.push(`resolutions ${provider.supportedResolutions?.join("/")}`);
+          if ((provider.capabilities.geometry?.resolutions?.length ?? 0) > 0) {
+            caps.push(`resolutions ${provider.capabilities.geometry?.resolutions?.join("/")}`);
          }
-          if ((provider.supportedSizes?.length ?? 0) > 0) {
-            caps.push(`sizes ${provider.supportedSizes?.join(", ")}`);
+          if ((provider.capabilities.geometry?.sizes?.length ?? 0) > 0) {
+            caps.push(`sizes ${provider.capabilities.geometry?.sizes?.join(", ")}`);
+          }
+          if ((provider.capabilities.geometry?.aspectRatios?.length ?? 0) > 0) {
+            caps.push(`aspect ratios ${provider.capabilities.geometry?.aspectRatios?.join(", ")}`);
          }
          const modelLine =
            provider.models.length > 0
@ -396,7 +540,9 @@ export function createImageGenerateTool(options?: {
      const prompt = readStringParam(params, "prompt", { required: true });
      const imageInputs = normalizeReferenceImages(params);
      const model = readStringParam(params, "model");
+      const filename = readStringParam(params, "filename");
      const size = readStringParam(params, "size");
+      const aspectRatio = normalizeAspectRatio(readStringParam(params, "aspectRatio"));
      const explicitResolution = normalizeResolution(readStringParam(params, "resolution"));
      const count = resolveRequestedCount(params);
      const loadedReferenceImages = await loadReferenceImages({
@ -412,6 +558,19 @@ export function createImageGenerateTool(options?: {
          : inputImages.length > 0
            ? await inferResolutionFromInputImages(inputImages)
            : undefined);
+      const selectedProvider = resolveSelectedImageGenerationProvider({
+        config: effectiveCfg,
+        imageGenerationModelConfig,
+        modelOverride: model,
+      });
+      validateImageGenerationCapabilities({
+        provider: selectedProvider,
+        count,
+        inputImageCount: inputImages.length,
+        size,
+        aspectRatio,
+        resolution,
+      });

      const result = await generateImage({
        cfg: effectiveCfg,
@ -419,6 +578,7 @@ export function createImageGenerateTool(options?: {
        agentDir: options?.agentDir,
        modelOverride: model,
        size,
+        aspectRatio,
        resolution,
        count,
        inputImages,
@ -431,7 +591,7 @@ export function createImageGenerateTool(options?: {
            image.mimeType,
            "tool-image-generation",
            undefined,
-            image.fileName,
+            filename || image.fileName,
          ),
        ),
      );
@ -468,6 +628,8 @@ export function createImageGenerateTool(options?: {
              : {}),
          ...(resolution ? { resolution } : {}),
          ...(size ? { size } : {}),
+          ...(aspectRatio ? { aspectRatio } : {}),
+          ...(filename ? { filename } : {}),
          attempts: result.attempts,
          metadata: result.metadata,
          ...(revisedPrompts.length > 0 ? { revisedPrompts } : {}),
--- a/src/commands/doctor-legacy-config.migrations.test.ts
+++ b/src/commands/doctor-legacy-config.migrations.test.ts
@ -297,4 +297,99 @@ describe("normalizeCompatibilityConfigValues", () => {
      "Moved browser.ssrfPolicy.allowPrivateNetwork → browser.ssrfPolicy.dangerouslyAllowPrivateNetwork (true).",
    );
  });
+
+  it("migrates nano-banana skill config to native image generation config", () => {
+    const res = normalizeCompatibilityConfigValues({
+      skills: {
+        entries: {
+          "nano-banana-pro": {
+            enabled: true,
+            apiKey: { source: "env", provider: "default", id: "GEMINI_API_KEY" },
+          },
+        },
+      },
+    });
+
+    expect(res.config.agents?.defaults?.imageGenerationModel).toEqual({
+      primary: "google/gemini-3-pro-image-preview",
+    });
+    expect(res.config.models?.providers?.google?.apiKey).toEqual({
+      source: "env",
+      provider: "default",
+      id: "GEMINI_API_KEY",
+    });
+    expect(res.config.skills?.entries).toBeUndefined();
+    expect(res.changes).toEqual([
+      "Moved skills.entries.nano-banana-pro → agents.defaults.imageGenerationModel.primary (google/gemini-3-pro-image-preview).",
+      "Moved skills.entries.nano-banana-pro.apiKey → models.providers.google.apiKey.",
+      "Removed legacy skills.entries.nano-banana-pro.",
+    ]);
+  });
+
+  it("prefers legacy nano-banana env.GEMINI_API_KEY over skill apiKey during migration", () => {
+    const res = normalizeCompatibilityConfigValues({
+      skills: {
+        entries: {
+          "nano-banana-pro": {
+            apiKey: "ignored-skill-api-key",
+            env: {
+              GEMINI_API_KEY: "env-gemini-key",
+            },
+          },
+        },
+      },
+    });
+
+    expect(res.config.models?.providers?.google?.apiKey).toBe("env-gemini-key");
+    expect(res.changes).toContain(
+      "Moved skills.entries.nano-banana-pro.env.GEMINI_API_KEY → models.providers.google.apiKey.",
+    );
+  });
+
+  it("preserves explicit native config while removing legacy nano-banana skill config", () => {
+    const res = normalizeCompatibilityConfigValues({
+      agents: {
+        defaults: {
+          imageGenerationModel: {
+            primary: "fal/fal-ai/flux/dev",
+          },
+        },
+      },
+      models: {
+        providers: {
+          google: {
+            apiKey: "existing-google-key",
+          },
+        },
+      },
+      skills: {
+        entries: {
+          "nano-banana-pro": {
+            apiKey: "legacy-gemini-key",
+          },
+          peekaboo: { enabled: true },
+        },
+      },
+    });
+
+    expect(res.config.agents?.defaults?.imageGenerationModel).toEqual({
+      primary: "fal/fal-ai/flux/dev",
+    });
+    expect(res.config.models?.providers?.google?.apiKey).toBe("existing-google-key");
+    expect(res.config.skills?.entries).toEqual({
+      peekaboo: { enabled: true },
+    });
+    expect(res.changes).toEqual(["Removed legacy skills.entries.nano-banana-pro."]);
+  });
+
+  it("removes nano-banana from skills.allowBundled during migration", () => {
+    const res = normalizeCompatibilityConfigValues({
+      skills: {
+        allowBundled: ["peekaboo", "nano-banana-pro"],
+      },
+    });
+
+    expect(res.config.skills?.allowBundled).toEqual(["peekaboo"]);
+    expect(res.changes).toEqual(["Removed nano-banana-pro from skills.allowBundled."]);
+  });
 });
--- a/src/commands/doctor-legacy-config.ts
+++ b/src/commands/doctor-legacy-config.ts
@ -15,6 +15,8 @@ export function normalizeCompatibilityConfigValues(cfg: OpenClawConfig): {
  changes: string[];
 } {
  const changes: string[] = [];
+  const NANO_BANANA_SKILL_KEY = "nano-banana-pro";
+  const NANO_BANANA_MODEL = "google/gemini-3-pro-image-preview";
  let next: OpenClawConfig = cfg;

  const isRecord = (value: unknown): value is Record<string, unknown> =>
@ -471,7 +473,121 @@ export function normalizeCompatibilityConfigValues(cfg: OpenClawConfig): {
    );
  };

+  const normalizeLegacyNanoBananaSkill = () => {
+    const rawSkills = next.skills;
+    if (!isRecord(rawSkills)) {
+      return;
+    }
+
+    let skillsChanged = false;
+    let skills = structuredClone(rawSkills);
+
+    if (Array.isArray(skills.allowBundled)) {
+      const allowBundled = skills.allowBundled.filter(
+        (value) => typeof value !== "string" || value.trim() !== NANO_BANANA_SKILL_KEY,
+      );
+      if (allowBundled.length !== skills.allowBundled.length) {
+        if (allowBundled.length === 0) {
+          delete skills.allowBundled;
+          changes.push(`Removed skills.allowBundled entry for ${NANO_BANANA_SKILL_KEY}.`);
+        } else {
+          skills.allowBundled = allowBundled;
+          changes.push(`Removed ${NANO_BANANA_SKILL_KEY} from skills.allowBundled.`);
+        }
+        skillsChanged = true;
+      }
+    }
+
+    const rawEntries = skills.entries;
+    if (!isRecord(rawEntries)) {
+      if (skillsChanged) {
+        next = { ...next, skills };
+      }
+      return;
+    }
+
+    const rawLegacyEntry = rawEntries[NANO_BANANA_SKILL_KEY];
+    if (!isRecord(rawLegacyEntry)) {
+      if (skillsChanged) {
+        next = { ...next, skills };
+      }
+      return;
+    }
+
+    const existingImageGenerationModel = next.agents?.defaults?.imageGenerationModel;
+    if (existingImageGenerationModel === undefined) {
+      next = {
+        ...next,
+        agents: {
+          ...next.agents,
+          defaults: {
+            ...next.agents?.defaults,
+            imageGenerationModel: {
+              primary: NANO_BANANA_MODEL,
+            },
+          },
+        },
+      };
+      changes.push(
+        `Moved skills.entries.${NANO_BANANA_SKILL_KEY} → agents.defaults.imageGenerationModel.primary (${NANO_BANANA_MODEL}).`,
+      );
+    }
+
+    const legacyEnv = isRecord(rawLegacyEntry.env) ? rawLegacyEntry.env : undefined;
+    const legacyEnvApiKey =
+      typeof legacyEnv?.GEMINI_API_KEY === "string" ? legacyEnv.GEMINI_API_KEY.trim() : "";
+    const legacyApiKey =
+      legacyEnvApiKey ||
+      (typeof rawLegacyEntry.apiKey === "string"
+        ? rawLegacyEntry.apiKey.trim()
+        : rawLegacyEntry.apiKey && isRecord(rawLegacyEntry.apiKey)
+          ? structuredClone(rawLegacyEntry.apiKey)
+          : undefined);
+
+    const rawModels = isRecord(next.models) ? structuredClone(next.models) : {};
+    const rawProviders = isRecord(rawModels.providers) ? { ...rawModels.providers } : {};
+    const rawGoogle = isRecord(rawProviders.google) ? { ...rawProviders.google } : {};
+    const hasGoogleApiKey = rawGoogle.apiKey !== undefined;
+    if (!hasGoogleApiKey && legacyApiKey) {
+      rawGoogle.apiKey = legacyApiKey;
+      rawProviders.google = rawGoogle;
+      rawModels.providers = rawProviders;
+      next = {
+        ...next,
+        models: rawModels as OpenClawConfig["models"],
+      };
+      changes.push(
+        `Moved skills.entries.${NANO_BANANA_SKILL_KEY}.${legacyEnvApiKey ? "env.GEMINI_API_KEY" : "apiKey"} → models.providers.google.apiKey.`,
+      );
+    }
+
+    const entries = { ...rawEntries };
+    delete entries[NANO_BANANA_SKILL_KEY];
+    if (Object.keys(entries).length === 0) {
+      delete skills.entries;
+      changes.push(`Removed legacy skills.entries.${NANO_BANANA_SKILL_KEY}.`);
+    } else {
+      skills.entries = entries;
+      changes.push(`Removed legacy skills.entries.${NANO_BANANA_SKILL_KEY}.`);
+    }
+    skillsChanged = true;
+
+    if (Object.keys(skills).length === 0) {
+      const { skills: _ignored, ...rest } = next;
+      next = rest;
+      return;
+    }
+
+    if (skillsChanged) {
+      next = {
+        ...next,
+        skills,
+      };
+    }
+  };
+
  normalizeBrowserSsrFPolicyAlias();
+  normalizeLegacyNanoBananaSkill();

  const legacyAckReaction = cfg.messages?.ackReaction?.trim();
  const hasWhatsAppConfig = cfg.channels?.whatsapp !== undefined;
--- a/src/image-generation/providers/fal.test.ts
+++ b/src/image-generation/providers/fal.test.ts
@ -127,6 +127,97 @@ describe("fal image-generation provider", () => {
    );
  });

+  it("maps aspect ratio for text generation without forcing a square default", async () => {
+    vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
+      apiKey: "fal-test-key",
+      source: "env",
+      mode: "api-key",
+    });
+    const fetchMock = vi
+      .fn()
+      .mockResolvedValueOnce({
+        ok: true,
+        json: async () => ({
+          images: [{ url: "https://v3.fal.media/files/example/wide.png" }],
+        }),
+      })
+      .mockResolvedValueOnce({
+        ok: true,
+        headers: new Headers({ "content-type": "image/png" }),
+        arrayBuffer: async () => Buffer.from("wide-data"),
+      });
+    vi.stubGlobal("fetch", fetchMock);
+
+    const provider = buildFalImageGenerationProvider();
+    await provider.generateImage({
+      provider: "fal",
+      model: "fal-ai/flux/dev",
+      prompt: "wide cinematic shot",
+      cfg: {},
+      aspectRatio: "16:9",
+    });
+
+    expect(fetchMock).toHaveBeenNthCalledWith(
+      1,
+      "https://fal.run/fal-ai/flux/dev",
+      expect.objectContaining({
+        method: "POST",
+        body: JSON.stringify({
+          prompt: "wide cinematic shot",
+          image_size: "landscape_16_9",
+          num_images: 1,
+          output_format: "png",
+        }),
+      }),
+    );
+  });
+
+  it("combines resolution and aspect ratio for text generation", async () => {
+    vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
+      apiKey: "fal-test-key",
+      source: "env",
+      mode: "api-key",
+    });
+    const fetchMock = vi
+      .fn()
+      .mockResolvedValueOnce({
+        ok: true,
+        json: async () => ({
+          images: [{ url: "https://v3.fal.media/files/example/portrait.png" }],
+        }),
+      })
+      .mockResolvedValueOnce({
+        ok: true,
+        headers: new Headers({ "content-type": "image/png" }),
+        arrayBuffer: async () => Buffer.from("portrait-data"),
+      });
+    vi.stubGlobal("fetch", fetchMock);
+
+    const provider = buildFalImageGenerationProvider();
+    await provider.generateImage({
+      provider: "fal",
+      model: "fal-ai/flux/dev",
+      prompt: "portrait poster",
+      cfg: {},
+      resolution: "2K",
+      aspectRatio: "9:16",
+    });
+
+    expect(fetchMock).toHaveBeenNthCalledWith(
+      1,
+      "https://fal.run/fal-ai/flux/dev",
+      expect.objectContaining({
+        method: "POST",
+        body: JSON.stringify({
+          prompt: "portrait poster",
+          image_size: { width: 1152, height: 2048 },
+          num_images: 1,
+          output_format: "png",
+        }),
+      }),
+    );
+  });
+
  it("rejects multi-image edit requests for now", async () => {
    vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
      apiKey: "fal-test-key",
@ -148,4 +239,24 @@ describe("fal image-generation provider", () => {
      }),
    ).rejects.toThrow("at most one reference image");
  });
+
+  it("rejects aspect ratio overrides for the current edit endpoint", async () => {
+    vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
+      apiKey: "fal-test-key",
+      source: "env",
+      mode: "api-key",
+    });
+
+    const provider = buildFalImageGenerationProvider();
+    await expect(
+      provider.generateImage({
+        provider: "fal",
+        model: "fal-ai/flux/dev",
+        prompt: "make it widescreen",
+        cfg: {},
+        aspectRatio: "16:9",
+        inputImages: [{ buffer: Buffer.from("one"), mimeType: "image/png" }],
+      }),
+    ).rejects.toThrow("does not support aspectRatio overrides");
+  });
 });
--- a/src/image-generation/providers/fal.ts
+++ b/src/image-generation/providers/fal.ts
@ -5,8 +5,15 @@ import type { GeneratedImageAsset } from "../types.js";
 const DEFAULT_FAL_BASE_URL = "https://fal.run";
 const DEFAULT_FAL_IMAGE_MODEL = "fal-ai/flux/dev";
 const DEFAULT_FAL_EDIT_SUBPATH = "image-to-image";
-const DEFAULT_OUTPUT_SIZE = "square_hd";
 const DEFAULT_OUTPUT_FORMAT = "png";
+const FAL_SUPPORTED_SIZES = [
+  "1024x1024",
+  "1024x1536",
+  "1536x1024",
+  "1024x1792",
+  "1792x1024",
+] as const;
+const FAL_SUPPORTED_ASPECT_RATIOS = ["1:1", "4:3", "3:4", "16:9", "9:16"] as const;

 type FalGeneratedImage = {
  url?: string;
@ -57,23 +64,85 @@ function parseSize(raw: string | undefined): { width: number; height: number } |
  return { width, height };
 }

-function mapResolutionToSize(resolution: "1K" | "2K" | "4K" | undefined): FalImageSize | undefined {
+function mapResolutionToEdge(resolution: "1K" | "2K" | "4K" | undefined): number | undefined {
  if (!resolution) {
    return undefined;
  }
-  const edge = resolution === "4K" ? 4096 : resolution === "2K" ? 2048 : 1024;
-  return { width: edge, height: edge };
+  return resolution === "4K" ? 4096 : resolution === "2K" ? 2048 : 1024;
+}
+
+function aspectRatioToEnum(aspectRatio: string | undefined): string | undefined {
+  const normalized = aspectRatio?.trim();
+  if (!normalized) {
+    return undefined;
+  }
+  if (normalized === "1:1") {
+    return "square_hd";
+  }
+  if (normalized === "4:3") {
+    return "landscape_4_3";
+  }
+  if (normalized === "3:4") {
+    return "portrait_4_3";
+  }
+  if (normalized === "16:9") {
+    return "landscape_16_9";
+  }
+  if (normalized === "9:16") {
+    return "portrait_16_9";
+  }
+  return undefined;
+}
+
+function aspectRatioToDimensions(aspectRatio: string, edge: number): { width: number; height: number } {
+  const match = /^(\d+):(\d+)$/u.exec(aspectRatio.trim());
+  if (!match) {
+    throw new Error(`Invalid fal aspect ratio: ${aspectRatio}`);
+  }
+  const widthRatio = Number.parseInt(match[1] ?? "", 10);
+  const heightRatio = Number.parseInt(match[2] ?? "", 10);
+  if (!Number.isFinite(widthRatio) || !Number.isFinite(heightRatio) || widthRatio <= 0 || heightRatio <= 0) {
+    throw new Error(`Invalid fal aspect ratio: ${aspectRatio}`);
+  }
+  if (widthRatio >= heightRatio) {
+    return {
+      width: edge,
+      height: Math.max(1, Math.round((edge * heightRatio) / widthRatio)),
+    };
+  }
+  return {
+    width: Math.max(1, Math.round((edge * widthRatio) / heightRatio)),
+    height: edge,
+  };
 }

 function resolveFalImageSize(params: {
  size?: string;
  resolution?: "1K" | "2K" | "4K";
-}): FalImageSize {
+  aspectRatio?: string;
+  hasInputImages: boolean;
+}): FalImageSize | undefined {
  const parsed = parseSize(params.size);
  if (parsed) {
    return parsed;
  }
-  return mapResolutionToSize(params.resolution) ?? DEFAULT_OUTPUT_SIZE;
+
+  const normalizedAspectRatio = params.aspectRatio?.trim();
+  if (normalizedAspectRatio && params.hasInputImages) {
+    throw new Error("fal image edit endpoint does not support aspectRatio overrides");
+  }
+
+  const edge = mapResolutionToEdge(params.resolution);
+  if (normalizedAspectRatio && edge) {
+    return aspectRatioToDimensions(normalizedAspectRatio, edge);
+  }
+  if (edge) {
+    return { width: edge, height: edge };
+  }
+  if (normalizedAspectRatio) {
+    return aspectRatioToEnum(normalizedAspectRatio) ?? aspectRatioToDimensions(normalizedAspectRatio, 1024);
+  }
+  return undefined;
 }

 function toDataUri(buffer: Buffer, mimeType: string): string {
@ -111,9 +180,27 @@ export function buildFalImageGenerationProvider(): ImageGenerationProviderPlugin
    label: "fal",
    defaultModel: DEFAULT_FAL_IMAGE_MODEL,
    models: [DEFAULT_FAL_IMAGE_MODEL, `${DEFAULT_FAL_IMAGE_MODEL}/${DEFAULT_FAL_EDIT_SUBPATH}`],
-    supportedSizes: ["1024x1024", "1024x1536", "1536x1024", "1024x1792", "1792x1024"],
-    supportedResolutions: ["1K", "2K", "4K"],
-    supportsImageEditing: true,
+    capabilities: {
+      generate: {
+        maxCount: 4,
+        supportsSize: true,
+        supportsAspectRatio: true,
+        supportsResolution: true,
+      },
+      edit: {
+        enabled: true,
+        maxCount: 4,
+        maxInputImages: 1,
+        supportsSize: true,
+        supportsAspectRatio: false,
+        supportsResolution: true,
+      },
+      geometry: {
+        sizes: [...FAL_SUPPORTED_SIZES],
+        aspectRatios: [...FAL_SUPPORTED_ASPECT_RATIOS],
+        resolutions: ["1K", "2K", "4K"],
+      },
+    },
    async generateImage(req) {
      const auth = await resolveApiKeyForProvider({
        provider: "fal",
@ -128,18 +215,22 @@ export function buildFalImageGenerationProvider(): ImageGenerationProviderPlugin
        throw new Error("fal image generation currently supports at most one reference image");
      }

+      const hasInputImages = (req.inputImages?.length ?? 0) > 0;
      const imageSize = resolveFalImageSize({
        size: req.size,
        resolution: req.resolution,
+        aspectRatio: req.aspectRatio,
+        hasInputImages,
      });
-      const hasInputImages = (req.inputImages?.length ?? 0) > 0;
      const model = ensureFalModelPath(req.model, hasInputImages);
      const requestBody: Record<string, unknown> = {
        prompt: req.prompt,
-        image_size: imageSize,
        num_images: req.count ?? 1,
        output_format: DEFAULT_OUTPUT_FORMAT,
      };
+      if (imageSize !== undefined) {
+        requestBody.image_size = imageSize;
+      }

      if (hasInputImages) {
        const [input] = req.inputImages ?? [];
--- a/src/image-generation/providers/google.test.ts
+++ b/src/image-generation/providers/google.test.ts
@ -197,7 +197,6 @@ describe("Google image-generation provider", () => {
          generationConfig: {
            responseModalities: ["TEXT", "IMAGE"],
            imageConfig: {
-              aspectRatio: "1:1",
              imageSize: "4K",
            },
          },
@ -205,4 +204,62 @@ describe("Google image-generation provider", () => {
      }),
    );
  });
+
+  it("forwards explicit aspect ratio without forcing a default when size is omitted", async () => {
+    vi.spyOn(modelAuth, "resolveApiKeyForProvider").mockResolvedValue({
+      apiKey: "google-test-key",
+      source: "env",
+      mode: "api-key",
+    });
+    const fetchMock = vi.fn().mockResolvedValue({
+      ok: true,
+      json: async () => ({
+        candidates: [
+          {
+            content: {
+              parts: [
+                {
+                  inlineData: {
+                    mimeType: "image/png",
+                    data: Buffer.from("png-data").toString("base64"),
+                  },
+                },
+              ],
+            },
+          },
+        ],
+      }),
+    });
+    vi.stubGlobal("fetch", fetchMock);
+
+    const provider = buildGoogleImageGenerationProvider();
+    await provider.generateImage({
+      provider: "google",
+      model: "gemini-3-pro-image-preview",
+      prompt: "portrait photo",
+      cfg: {},
+      aspectRatio: "9:16",
+    });
+
+    expect(fetchMock).toHaveBeenCalledWith(
+      "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent",
+      expect.objectContaining({
+        method: "POST",
+        body: JSON.stringify({
+          contents: [
+            {
+              role: "user",
+              parts: [{ text: "portrait photo" }],
+            },
+          ],
+          generationConfig: {
+            responseModalities: ["TEXT", "IMAGE"],
+            imageConfig: {
+              aspectRatio: "9:16",
+            },
+          },
+        }),
+      }),
+    );
+  });
 });
--- a/src/image-generation/providers/google.ts
+++ b/src/image-generation/providers/google.ts
@ -11,7 +11,25 @@ import type { ImageGenerationProviderPlugin } from "../../plugins/types.js";
 const DEFAULT_GOOGLE_IMAGE_BASE_URL = "https://generativelanguage.googleapis.com/v1beta";
 const DEFAULT_GOOGLE_IMAGE_MODEL = "gemini-3.1-flash-image-preview";
 const DEFAULT_OUTPUT_MIME = "image/png";
-const DEFAULT_ASPECT_RATIO = "1:1";
+const GOOGLE_SUPPORTED_SIZES = [
+  "1024x1024",
+  "1024x1536",
+  "1536x1024",
+  "1024x1792",
+  "1792x1024",
+] as const;
+const GOOGLE_SUPPORTED_ASPECT_RATIOS = [
+  "1:1",
+  "2:3",
+  "3:2",
+  "3:4",
+  "4:3",
+  "4:5",
+  "5:4",
+  "9:16",
+  "16:9",
+  "21:9",
+] as const;

 type GoogleInlineDataPart = {
  mimeType?: string;
@ -46,7 +64,7 @@ function mapSizeToImageConfig(
 ): { aspectRatio?: string; imageSize?: "2K" | "4K" } | undefined {
  const trimmed = size?.trim();
  if (!trimmed) {
-    return { aspectRatio: DEFAULT_ASPECT_RATIO };
+    return undefined;
  }

  const normalized = trimmed.toLowerCase();
@ -81,8 +99,27 @@ export function buildGoogleImageGenerationProvider(): ImageGenerationProviderPlu
    label: "Google",
    defaultModel: DEFAULT_GOOGLE_IMAGE_MODEL,
    models: [DEFAULT_GOOGLE_IMAGE_MODEL, "gemini-3-pro-image-preview"],
-    supportedResolutions: ["1K", "2K", "4K"],
-    supportsImageEditing: true,
+    capabilities: {
+      generate: {
+        maxCount: 4,
+        supportsSize: true,
+        supportsAspectRatio: true,
+        supportsResolution: true,
+      },
+      edit: {
+        enabled: true,
+        maxCount: 4,
+        maxInputImages: 5,
+        supportsSize: true,
+        supportsAspectRatio: true,
+        supportsResolution: true,
+      },
+      geometry: {
+        sizes: [...GOOGLE_SUPPORTED_SIZES],
+        aspectRatios: [...GOOGLE_SUPPORTED_ASPECT_RATIOS],
+        resolutions: ["1K", "2K", "4K"],
+      },
+    },
    async generateImage(req) {
      const auth = await resolveApiKeyForProvider({
        provider: "google",
@ -111,6 +148,7 @@ export function buildGoogleImageGenerationProvider(): ImageGenerationProviderPlu
      }));
      const resolvedImageConfig = {
        ...imageConfig,
+        ...(req.aspectRatio?.trim() ? { aspectRatio: req.aspectRatio.trim() } : {}),
        ...(req.resolution ? { imageSize: req.resolution } : {}),
      };

--- a/src/image-generation/providers/openai.ts
+++ b/src/image-generation/providers/openai.ts
@ -5,6 +5,7 @@ const DEFAULT_OPENAI_IMAGE_BASE_URL = "https://api.openai.com/v1";
 const DEFAULT_OPENAI_IMAGE_MODEL = "gpt-image-1";
 const DEFAULT_OUTPUT_MIME = "image/png";
 const DEFAULT_SIZE = "1024x1024";
+const OPENAI_SUPPORTED_SIZES = ["1024x1024", "1024x1536", "1536x1024"] as const;

 type OpenAIImageApiResponse = {
  data?: Array<{
@ -24,7 +25,25 @@ export function buildOpenAIImageGenerationProvider(): ImageGenerationProviderPlu
    label: "OpenAI",
    defaultModel: DEFAULT_OPENAI_IMAGE_MODEL,
    models: [DEFAULT_OPENAI_IMAGE_MODEL],
-    supportedSizes: ["1024x1024", "1024x1536", "1536x1024"],
+    capabilities: {
+      generate: {
+        maxCount: 4,
+        supportsSize: true,
+        supportsAspectRatio: false,
+        supportsResolution: false,
+      },
+      edit: {
+        enabled: false,
+        maxCount: 0,
+        maxInputImages: 0,
+        supportsSize: false,
+        supportsAspectRatio: false,
+        supportsResolution: false,
+      },
+      geometry: {
+        sizes: [...OPENAI_SUPPORTED_SIZES],
+      },
+    },
    async generateImage(req) {
      if ((req.inputImages?.length ?? 0) > 0) {
        throw new Error("OpenAI image generation provider does not support reference-image edits");
--- a/src/image-generation/runtime.test.ts
+++ b/src/image-generation/runtime.test.ts
@ -19,6 +19,10 @@ describe("image-generation runtime helpers", () => {
      source: "test",
      provider: {
        id: "image-plugin",
+        capabilities: {
+          generate: {},
+          edit: { enabled: false },
+        },
        async generateImage(req) {
          seenAuthStore = req.authStore;
          return {
@ -76,7 +80,18 @@ describe("image-generation runtime helpers", () => {
        id: "image-plugin",
        defaultModel: "img-v1",
        models: ["img-v1", "img-v2"],
-        supportedResolutions: ["1K", "2K"],
+        capabilities: {
+          generate: {
+            supportsResolution: true,
+          },
+          edit: {
+            enabled: true,
+            maxInputImages: 3,
+          },
+          geometry: {
+            resolutions: ["1K", "2K"],
+          },
+        },
        generateImage: async () => ({
          images: [{ buffer: Buffer.from("x"), mimeType: "image/png" }],
        }),
@ -89,7 +104,18 @@ describe("image-generation runtime helpers", () => {
        id: "image-plugin",
        defaultModel: "img-v1",
        models: ["img-v1", "img-v2"],
-        supportedResolutions: ["1K", "2K"],
+        capabilities: {
+          generate: {
+            supportsResolution: true,
+          },
+          edit: {
+            enabled: true,
+            maxInputImages: 3,
+          },
+          geometry: {
+            resolutions: ["1K", "2K"],
+          },
+        },
      },
    ]);
  });
--- a/src/image-generation/runtime.ts
+++ b/src/image-generation/runtime.ts
@ -25,6 +25,7 @@ export type GenerateImageParams = {
  modelOverride?: string;
  count?: number;
  size?: string;
+  aspectRatio?: string;
  resolution?: ImageGenerationResolution;
  inputImages?: ImageGenerationSourceImage[];
 };
@ -142,6 +143,7 @@ export async function generateImage(
        authStore: params.authStore,
        count: params.count,
        size: params.size,
+        aspectRatio: params.aspectRatio,
        resolution: params.resolution,
        inputImages: params.inputImages,
      });
--- a/src/image-generation/types.ts
+++ b/src/image-generation/types.ts
@ -27,6 +27,7 @@ export type ImageGenerationRequest = {
  authStore?: AuthProfileStore;
  count?: number;
  size?: string;
+  aspectRatio?: string;
  resolution?: ImageGenerationResolution;
  inputImages?: ImageGenerationSourceImage[];
 };
@ -37,14 +38,36 @@ export type ImageGenerationResult = {
  metadata?: Record<string, unknown>;
 };

+export type ImageGenerationModeCapabilities = {
+  maxCount?: number;
+  supportsSize?: boolean;
+  supportsAspectRatio?: boolean;
+  supportsResolution?: boolean;
+};
+
+export type ImageGenerationEditCapabilities = ImageGenerationModeCapabilities & {
+  enabled: boolean;
+  maxInputImages?: number;
+};
+
+export type ImageGenerationGeometryCapabilities = {
+  sizes?: string[];
+  aspectRatios?: string[];
+  resolutions?: ImageGenerationResolution[];
+};
+
+export type ImageGenerationProviderCapabilities = {
+  generate: ImageGenerationModeCapabilities;
+  edit: ImageGenerationEditCapabilities;
+  geometry?: ImageGenerationGeometryCapabilities;
+};
+
 export type ImageGenerationProvider = {
  id: string;
  aliases?: string[];
  label?: string;
  defaultModel?: string;
  models?: string[];
-  supportedSizes?: string[];
-  supportedResolutions?: ImageGenerationResolution[];
-  supportsImageEditing?: boolean;
+  capabilities: ImageGenerationProviderCapabilities;
  generateImage: (req: ImageGenerationRequest) => Promise<ImageGenerationResult>;
 };