openclaw/docs/refactor/firecrawl-extension.md

---
summary: "Design for an opt-in Firecrawl extension that adds search/scrape value without hardwiring Firecrawl into core defaults"
read_when:
  - Designing Firecrawl integration work
  - Evaluating web_search/web_fetch plugin extension surfaces
  - Deciding whether Firecrawl belongs in core or as an extension
title: "Firecrawl Extension Design"
---

# Firecrawl Extension Design

## Goal

Ship Firecrawl as an **opt-in extension** that adds:

- explicit Firecrawl tools for agents,
- optional Firecrawl-backed `web_search` integration,
- self-hosted support,
- stronger security defaults than the current core fallback path,

without pushing Firecrawl into the default setup/onboarding path.

## Why this shape

Recent Firecrawl issues/PRs cluster into three buckets:

1. **Release/schema drift**
   - Several releases rejected `tools.web.fetch.firecrawl` even though docs and runtime code supported it.
2. **Security hardening**
   - Current `fetchFirecrawlContent()` still posts to the Firecrawl endpoint with raw `fetch()`, while the main web-fetch path uses the SSRF guard.
3. **Product pressure**
   - Users want Firecrawl-native search/scrape flows, especially for self-hosted/private setups.
   - Maintainers explicitly rejected wiring Firecrawl deeply into core defaults, setup flow, and browser behavior.

That combination argues for an extension, not more Firecrawl-specific logic in the default core path.

## Design principles

- **Opt-in, vendor-scoped**: no auto-enable, no setup hijack, no default tool-profile widening.
- **Extension owns Firecrawl-specific config**: prefer plugin config over growing `tools.web.*` again.
- **Useful on day one**: works even if core `web_search` / `web_fetch` extension surfaces stay unchanged.
- **Security-first**: endpoint fetches use the same guarded networking posture as other web tools.
- **Self-hosted-friendly**: config + env fallback, explicit base URL, no hosted-only assumptions.

## Proposed extension

Plugin id: `firecrawl`

### MVP capabilities

Register explicit tools:

- `firecrawl_search`
- `firecrawl_scrape`

Optional later:

- `firecrawl_crawl`
- `firecrawl_map`

Do **not** add Firecrawl browser automation in the first version. That was the part of PR #32543 that pulled Firecrawl too far into core behavior and raised the most maintainership concern.

## Config shape

Use plugin-scoped config:

```json5
{
  plugins: {
    entries: {
      firecrawl: {
        enabled: true,
        config: {
          apiKey: "FIRECRAWL_API_KEY",
          baseUrl: "https://api.firecrawl.dev",
          timeoutSeconds: 60,
          maxAgeMs: 172800000,
          proxy: "auto",
          storeInCache: true,
          onlyMainContent: true,
          search: {
            enabled: true,
            defaultLimit: 5,
            sources: ["web"],
            categories: [],
            scrapeResults: false,
          },
          scrape: {
            formats: ["markdown"],
            fallbackForWebFetchLikeUse: false,
          },
        },
      },
    },
  },
}
```

### Credential resolution

Precedence:

1. `plugins.entries.firecrawl.config.apiKey`
2. `FIRECRAWL_API_KEY`

Base URL precedence:

1. `plugins.entries.firecrawl.config.baseUrl`
2. `FIRECRAWL_BASE_URL`
3. `https://api.firecrawl.dev`

### Compatibility bridge

For the first release, the extension may also **read** existing core config at `tools.web.fetch.firecrawl.*` as a fallback source so existing users do not need to migrate immediately.

Write path stays plugin-local. Do not keep expanding core Firecrawl config surfaces.

## Tool design

### `firecrawl_search`

Inputs:

- `query`
- `limit`
- `sources`
- `categories`
- `scrapeResults`
- `timeoutSeconds`

Behavior:

- Calls Firecrawl `v2/search`
- Returns normalized OpenClaw-friendly result objects:
  - `title`
  - `url`
  - `snippet`
  - `source`
  - optional `content`
- Wraps result content as untrusted external content
- Cache key includes query + relevant provider params

Why explicit tool first:

- Works today without changing `tools.web.search.provider`
- Avoids current schema/loader constraints
- Gives users Firecrawl value immediately

### `firecrawl_scrape`

Inputs:

- `url`
- `formats`
- `onlyMainContent`
- `maxAgeMs`
- `proxy`
- `storeInCache`
- `timeoutSeconds`

Behavior:

- Calls Firecrawl `v2/scrape`
- Returns markdown/text plus metadata:
  - `title`
  - `finalUrl`
  - `status`
  - `warning`
- Wraps extracted content the same way `web_fetch` does
- Shares cache semantics with web tool expectations where practical

Why explicit scrape tool:

- Sidesteps the unresolved `Readability -> Firecrawl -> basic HTML cleanup` ordering bug in core `web_fetch`
- Gives users a deterministic “always use Firecrawl” path for JS-heavy/bot-protected sites

## What the extension should not do

- No auto-adding `browser`, `web_search`, or `web_fetch` to `tools.alsoAllow`
- No default onboarding step in `openclaw setup`
- No Firecrawl-specific browser session lifecycle in core
- No change to built-in `web_fetch` fallback semantics in the extension MVP

## Phase plan

### Phase 1: extension-only, no core schema changes

Implement:

- `extensions/firecrawl/`
- plugin config schema
- `firecrawl_search`
- `firecrawl_scrape`
- tests for config resolution, endpoint selection, caching, error handling, and SSRF guard usage

This phase is enough to ship real user value.

### Phase 2: optional `web_search` provider integration

Support `tools.web.search.provider = "firecrawl"` only after fixing two core constraints:

1. `src/plugins/web-search-providers.ts` must load configured/installed web-search-provider plugins instead of a hardcoded bundled list.
2. `src/config/types.tools.ts` and `src/config/zod-schema.agent-runtime.ts` must stop hardcoding the provider enum in a way that blocks plugin-registered ids.

Recommended shape:

- keep built-in providers documented,
- allow any registered plugin provider id at runtime,
- validate provider-specific config via the provider plugin or a generic provider bag.

### Phase 3: optional `web_fetch` provider capability

Do this only if maintainers want vendor-specific fetch backends to participate in `web_fetch`.

Needed core addition:

- `registerWebFetchProvider` or equivalent fetch-backend extension surface

Without that capability, the extension should keep `firecrawl_scrape` as an explicit tool rather than trying to patch built-in `web_fetch`.

## Security requirements

The extension must treat Firecrawl as a **trusted operator-configured endpoint**, but still harden transport:

- Use SSRF-guarded fetch for the Firecrawl endpoint call, not raw `fetch()`
- Preserve self-hosted/private-network compatibility using the same trusted-web-tools endpoint policy used elsewhere
- Never log the API key
- Keep endpoint/base URL resolution explicit and predictable
- Treat Firecrawl-returned content as untrusted external content

This mirrors the intent behind the SSRF hardening PRs without assuming Firecrawl is a hostile multi-tenant surface.

## Why not a skill

The repo already closed a Firecrawl skill PR in favor of ClawHub distribution. That is fine for optional user-installed prompt workflows, but it does not solve:

- deterministic tool availability,
- provider-grade config/credential handling,
- self-hosted endpoint support,
- caching,
- stable typed outputs,
- security review on network behavior.

This belongs as an extension, not a prompt-only skill.

## Success criteria

- Users can install/enable one extension and get reliable Firecrawl search/scrape without touching core defaults.
- Self-hosted Firecrawl works with config/env fallback.
- Extension endpoint fetches use guarded networking.
- No new Firecrawl-specific core onboarding/default behavior.
- Core can later adopt plugin-native `web_search` / `web_fetch` extension surfaces without redesigning the extension.

## Recommended implementation order

1. Build `firecrawl_scrape`
2. Build `firecrawl_search`
3. Add docs and examples
4. If desired, generalize `web_search` provider loading so the extension can back `web_search`
5. Only then consider a true `web_fetch` provider capability
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00			`---`
			`summary: "Design for an opt-in Firecrawl extension that adds search/scrape value without hardwiring Firecrawl into core defaults"`
			`read_when:`
			`- Designing Firecrawl integration work`
docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			`- Evaluating web_search/web_fetch plugin extension surfaces`
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00			`- Deciding whether Firecrawl belongs in core or as an extension`
			`title: "Firecrawl Extension Design"`
			`---`

			`# Firecrawl Extension Design`

			`## Goal`

			`Ship Firecrawl as an opt-in extension that adds:`

			`- explicit Firecrawl tools for agents,`
			- optional Firecrawl-backed `web_search` integration,
			`- self-hosted support,`
			`- stronger security defaults than the current core fallback path,`

			`without pushing Firecrawl into the default setup/onboarding path.`

			`## Why this shape`

			`Recent Firecrawl issues/PRs cluster into three buckets:`

			`1. Release/schema drift`
			- Several releases rejected `tools.web.fetch.firecrawl` even though docs and runtime code supported it.
			`2. Security hardening`
			- Current `fetchFirecrawlContent()` still posts to the Firecrawl endpoint with raw `fetch()`, while the main web-fetch path uses the SSRF guard.
			`3. Product pressure`
			`- Users want Firecrawl-native search/scrape flows, especially for self-hosted/private setups.`
			`- Maintainers explicitly rejected wiring Firecrawl deeply into core defaults, setup flow, and browser behavior.`

			`That combination argues for an extension, not more Firecrawl-specific logic in the default core path.`

			`## Design principles`

			`- Opt-in, vendor-scoped: no auto-enable, no setup hijack, no default tool-profile widening.`
			- Extension owns Firecrawl-specific config: prefer plugin config over growing `tools.web.*` again.
docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			- Useful on day one: works even if core `web_search` / `web_fetch` extension surfaces stay unchanged.
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00			`- Security-first: endpoint fetches use the same guarded networking posture as other web tools.`
			`- Self-hosted-friendly: config + env fallback, explicit base URL, no hosted-only assumptions.`

			`## Proposed extension`

			Plugin id: `firecrawl`

			`### MVP capabilities`

			`Register explicit tools:`

			- `firecrawl_search`
			- `firecrawl_scrape`

			`Optional later:`

			- `firecrawl_crawl`
			- `firecrawl_map`

			`Do not add Firecrawl browser automation in the first version. That was the part of PR #32543 that pulled Firecrawl too far into core behavior and raised the most maintainership concern.`

			`## Config shape`

			`Use plugin-scoped config:`

			```json5
			`{`
			`plugins: {`
			`entries: {`
			`firecrawl: {`
			`enabled: true,`
			`config: {`
			`apiKey: "FIRECRAWL_API_KEY",`
			`baseUrl: "https://api.firecrawl.dev",`
			`timeoutSeconds: 60,`
			`maxAgeMs: 172800000,`
			`proxy: "auto",`
			`storeInCache: true,`
			`onlyMainContent: true,`
			`search: {`
			`enabled: true,`
			`defaultLimit: 5,`
			`sources: ["web"],`
			`categories: [],`
			`scrapeResults: false,`
			`},`
			`scrape: {`
			`formats: ["markdown"],`
			`fallbackForWebFetchLikeUse: false,`
			`},`
			`},`
			`},`
			`},`
			`},`
			`}`
			```

			`### Credential resolution`

			`Precedence:`

			1. `plugins.entries.firecrawl.config.apiKey`
			2. `FIRECRAWL_API_KEY`

			`Base URL precedence:`

			1. `plugins.entries.firecrawl.config.baseUrl`
			2. `FIRECRAWL_BASE_URL`
			3. `https://api.firecrawl.dev`

			`### Compatibility bridge`

			For the first release, the extension may also read existing core config at `tools.web.fetch.firecrawl.*` as a fallback source so existing users do not need to migrate immediately.

			`Write path stays plugin-local. Do not keep expanding core Firecrawl config surfaces.`

			`## Tool design`

			### `firecrawl_search`

			`Inputs:`

			- `query`
			- `limit`
			- `sources`
			- `categories`
			- `scrapeResults`
			- `timeoutSeconds`

			`Behavior:`

			- Calls Firecrawl `v2/search`
			`- Returns normalized OpenClaw-friendly result objects:`
			- `title`
			- `url`
			- `snippet`
			- `source`
			- optional `content`
			`- Wraps result content as untrusted external content`
			`- Cache key includes query + relevant provider params`

			`Why explicit tool first:`

			- Works today without changing `tools.web.search.provider`
			`- Avoids current schema/loader constraints`
			`- Gives users Firecrawl value immediately`

			### `firecrawl_scrape`

			`Inputs:`

			- `url`
			- `formats`
			- `onlyMainContent`
			- `maxAgeMs`
			- `proxy`
			- `storeInCache`
			- `timeoutSeconds`

			`Behavior:`

			- Calls Firecrawl `v2/scrape`
			`- Returns markdown/text plus metadata:`
			- `title`
			- `finalUrl`
			- `status`
			- `warning`
			- Wraps extracted content the same way `web_fetch` does
			`- Shares cache semantics with web tool expectations where practical`

			`Why explicit scrape tool:`

			- Sidesteps the unresolved `Readability -> Firecrawl -> basic HTML cleanup` ordering bug in core `web_fetch`
			`- Gives users a deterministic “always use Firecrawl” path for JS-heavy/bot-protected sites`

			`## What the extension should not do`

			- No auto-adding `browser`, `web_search`, or `web_fetch` to `tools.alsoAllow`
			- No default onboarding step in `openclaw setup`
			`- No Firecrawl-specific browser session lifecycle in core`
			- No change to built-in `web_fetch` fallback semantics in the extension MVP

			`## Phase plan`

			`### Phase 1: extension-only, no core schema changes`

			`Implement:`

			- `extensions/firecrawl/`
			`- plugin config schema`
			- `firecrawl_search`
			- `firecrawl_scrape`
			`- tests for config resolution, endpoint selection, caching, error handling, and SSRF guard usage`

			`This phase is enough to ship real user value.`

			### Phase 2: optional `web_search` provider integration

			Support `tools.web.search.provider = "firecrawl"` only after fixing two core constraints:

			1. `src/plugins/web-search-providers.ts` must load configured/installed web-search-provider plugins instead of a hardcoded bundled list.
			2. `src/config/types.tools.ts` and `src/config/zod-schema.agent-runtime.ts` must stop hardcoding the provider enum in a way that blocks plugin-registered ids.

			`Recommended shape:`

			`- keep built-in providers documented,`
			`- allow any registered plugin provider id at runtime,`
			`- validate provider-specific config via the provider plugin or a generic provider bag.`

docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			### Phase 3: optional `web_fetch` provider capability
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00
			Do this only if maintainers want vendor-specific fetch backends to participate in `web_fetch`.

			`Needed core addition:`

docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			- `registerWebFetchProvider` or equivalent fetch-backend extension surface
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00
docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			Without that capability, the extension should keep `firecrawl_scrape` as an explicit tool rather than trying to patch built-in `web_fetch`.
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00
			`## Security requirements`

			`The extension must treat Firecrawl as a trusted operator-configured endpoint, but still harden transport:`

			- Use SSRF-guarded fetch for the Firecrawl endpoint call, not raw `fetch()`
			`- Preserve self-hosted/private-network compatibility using the same trusted-web-tools endpoint policy used elsewhere`
			`- Never log the API key`
			`- Keep endpoint/base URL resolution explicit and predictable`
			`- Treat Firecrawl-returned content as untrusted external content`

			`This mirrors the intent behind the SSRF hardening PRs without assuming Firecrawl is a hostile multi-tenant surface.`

			`## Why not a skill`

			`The repo already closed a Firecrawl skill PR in favor of ClawHub distribution. That is fine for optional user-installed prompt workflows, but it does not solve:`

			`- deterministic tool availability,`
			`- provider-grade config/credential handling,`
			`- self-hosted endpoint support,`
			`- caching,`
			`- stable typed outputs,`
			`- security review on network behavior.`

			`This belongs as an extension, not a prompt-only skill.`

			`## Success criteria`

			`- Users can install/enable one extension and get reliable Firecrawl search/scrape without touching core defaults.`
			`- Self-hosted Firecrawl works with config/env fallback.`
			`- Extension endpoint fetches use guarded networking.`
			`- No new Firecrawl-specific core onboarding/default behavior.`
docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			- Core can later adopt plugin-native `web_search` / `web_fetch` extension surfaces without redesigning the extension.
feat: add firecrawl onboarding search plugin 2026-03-16 03:38:51 +00:00
			`## Recommended implementation order`

			1. Build `firecrawl_scrape`
			2. Build `firecrawl_search`
			`3. Add docs and examples`
			4. If desired, generalize `web_search` provider loading so the extension can back `web_search`
docs(refactor): replace seam terminology with capability/surface Align refactor docs with the public capability model vocabulary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-17 22:55:24 -07:00			5. Only then consider a true `web_fetch` provider capability