feat: add firecrawl onboarding search plugin

This commit is contained in:
Peter Steinberger 2026-03-16 03:38:51 +00:00
parent be8fef3840
commit aa28d1c711
26 changed files with 1593 additions and 92 deletions

View File

@ -11,6 +11,7 @@ Docs: https://docs.openclaw.ai
- Gateway/health monitor: add configurable stale-event thresholds and restart limits, plus per-channel and per-account `healthMonitor.enabled` overrides, while keeping the existing global disable path on `gateway.channelHealthCheckMinutes=0`. (#42107) Thanks @rstar327.
- Feishu/cards: add identity-aware structured card headers and note footers for Feishu replies and direct sends, while keeping that presentation wired through the shared outbound identity path. (#29938) Thanks @nszhsl.
- Feishu/streaming: add `onReasoningStream` and `onReasoningEnd` support to streaming cards, so `/reasoning stream` renders thinking tokens as markdown blockquotes in the same card — matching the Telegram channel's reasoning lane behavior. (#46029)
- Web tools/Firecrawl: add Firecrawl as an `onboard`/configure search provider via a bundled plugin, expose explicit `firecrawl_search` and `firecrawl_scrape` tools, and align core `web_fetch` fallback behavior with Firecrawl base-URL/env fallback plus guarded endpoint fetches.
- Refactor/channels: remove the legacy channel shim directories and point channel-specific imports directly at the extension-owned implementations. (#45967) thanks @scoootscooob.
- Android/nodes: add `callLog.search` plus shared Call Log permission wiring so Android nodes can search recent call history through the gateway. (#44073) Thanks @lxk7280.
- Docs/Zalo: clarify the Marketplace-bot support matrix and config guidance so the Zalo channel docs match current Bot Creator behavior more closely. (#47552) Thanks @No898.

View File

@ -0,0 +1,260 @@
---
summary: "Design for an opt-in Firecrawl extension that adds search/scrape value without hardwiring Firecrawl into core defaults"
read_when:
- Designing Firecrawl integration work
- Evaluating web_search/web_fetch plugin seams
- Deciding whether Firecrawl belongs in core or as an extension
title: "Firecrawl Extension Design"
---
# Firecrawl Extension Design
## Goal
Ship Firecrawl as an **opt-in extension** that adds:
- explicit Firecrawl tools for agents,
- optional Firecrawl-backed `web_search` integration,
- self-hosted support,
- stronger security defaults than the current core fallback path,
without pushing Firecrawl into the default setup/onboarding path.
## Why this shape
Recent Firecrawl issues/PRs cluster into three buckets:
1. **Release/schema drift**
- Several releases rejected `tools.web.fetch.firecrawl` even though docs and runtime code supported it.
2. **Security hardening**
- Current `fetchFirecrawlContent()` still posts to the Firecrawl endpoint with raw `fetch()`, while the main web-fetch path uses the SSRF guard.
3. **Product pressure**
- Users want Firecrawl-native search/scrape flows, especially for self-hosted/private setups.
- Maintainers explicitly rejected wiring Firecrawl deeply into core defaults, setup flow, and browser behavior.
That combination argues for an extension, not more Firecrawl-specific logic in the default core path.
## Design principles
- **Opt-in, vendor-scoped**: no auto-enable, no setup hijack, no default tool-profile widening.
- **Extension owns Firecrawl-specific config**: prefer plugin config over growing `tools.web.*` again.
- **Useful on day one**: works even if core `web_search` / `web_fetch` seams stay unchanged.
- **Security-first**: endpoint fetches use the same guarded networking posture as other web tools.
- **Self-hosted-friendly**: config + env fallback, explicit base URL, no hosted-only assumptions.
## Proposed extension
Plugin id: `firecrawl`
### MVP capabilities
Register explicit tools:
- `firecrawl_search`
- `firecrawl_scrape`
Optional later:
- `firecrawl_crawl`
- `firecrawl_map`
Do **not** add Firecrawl browser automation in the first version. That was the part of PR #32543 that pulled Firecrawl too far into core behavior and raised the most maintainership concern.
## Config shape
Use plugin-scoped config:
```json5
{
plugins: {
entries: {
firecrawl: {
enabled: true,
config: {
apiKey: "FIRECRAWL_API_KEY",
baseUrl: "https://api.firecrawl.dev",
timeoutSeconds: 60,
maxAgeMs: 172800000,
proxy: "auto",
storeInCache: true,
onlyMainContent: true,
search: {
enabled: true,
defaultLimit: 5,
sources: ["web"],
categories: [],
scrapeResults: false,
},
scrape: {
formats: ["markdown"],
fallbackForWebFetchLikeUse: false,
},
},
},
},
},
}
```
### Credential resolution
Precedence:
1. `plugins.entries.firecrawl.config.apiKey`
2. `FIRECRAWL_API_KEY`
Base URL precedence:
1. `plugins.entries.firecrawl.config.baseUrl`
2. `FIRECRAWL_BASE_URL`
3. `https://api.firecrawl.dev`
### Compatibility bridge
For the first release, the extension may also **read** existing core config at `tools.web.fetch.firecrawl.*` as a fallback source so existing users do not need to migrate immediately.
Write path stays plugin-local. Do not keep expanding core Firecrawl config surfaces.
## Tool design
### `firecrawl_search`
Inputs:
- `query`
- `limit`
- `sources`
- `categories`
- `scrapeResults`
- `timeoutSeconds`
Behavior:
- Calls Firecrawl `v2/search`
- Returns normalized OpenClaw-friendly result objects:
- `title`
- `url`
- `snippet`
- `source`
- optional `content`
- Wraps result content as untrusted external content
- Cache key includes query + relevant provider params
Why explicit tool first:
- Works today without changing `tools.web.search.provider`
- Avoids current schema/loader constraints
- Gives users Firecrawl value immediately
### `firecrawl_scrape`
Inputs:
- `url`
- `formats`
- `onlyMainContent`
- `maxAgeMs`
- `proxy`
- `storeInCache`
- `timeoutSeconds`
Behavior:
- Calls Firecrawl `v2/scrape`
- Returns markdown/text plus metadata:
- `title`
- `finalUrl`
- `status`
- `warning`
- Wraps extracted content the same way `web_fetch` does
- Shares cache semantics with web tool expectations where practical
Why explicit scrape tool:
- Sidesteps the unresolved `Readability -> Firecrawl -> basic HTML cleanup` ordering bug in core `web_fetch`
- Gives users a deterministic “always use Firecrawl” path for JS-heavy/bot-protected sites
## What the extension should not do
- No auto-adding `browser`, `web_search`, or `web_fetch` to `tools.alsoAllow`
- No default onboarding step in `openclaw setup`
- No Firecrawl-specific browser session lifecycle in core
- No change to built-in `web_fetch` fallback semantics in the extension MVP
## Phase plan
### Phase 1: extension-only, no core schema changes
Implement:
- `extensions/firecrawl/`
- plugin config schema
- `firecrawl_search`
- `firecrawl_scrape`
- tests for config resolution, endpoint selection, caching, error handling, and SSRF guard usage
This phase is enough to ship real user value.
### Phase 2: optional `web_search` provider integration
Support `tools.web.search.provider = "firecrawl"` only after fixing two core constraints:
1. `src/plugins/web-search-providers.ts` must load configured/installed web-search-provider plugins instead of a hardcoded bundled list.
2. `src/config/types.tools.ts` and `src/config/zod-schema.agent-runtime.ts` must stop hardcoding the provider enum in a way that blocks plugin-registered ids.
Recommended shape:
- keep built-in providers documented,
- allow any registered plugin provider id at runtime,
- validate provider-specific config via the provider plugin or a generic provider bag.
### Phase 3: optional `web_fetch` provider seam
Do this only if maintainers want vendor-specific fetch backends to participate in `web_fetch`.
Needed core addition:
- `registerWebFetchProvider` or equivalent fetch-backend seam
Without that seam, the extension should keep `firecrawl_scrape` as an explicit tool rather than trying to patch built-in `web_fetch`.
## Security requirements
The extension must treat Firecrawl as a **trusted operator-configured endpoint**, but still harden transport:
- Use SSRF-guarded fetch for the Firecrawl endpoint call, not raw `fetch()`
- Preserve self-hosted/private-network compatibility using the same trusted-web-tools endpoint policy used elsewhere
- Never log the API key
- Keep endpoint/base URL resolution explicit and predictable
- Treat Firecrawl-returned content as untrusted external content
This mirrors the intent behind the SSRF hardening PRs without assuming Firecrawl is a hostile multi-tenant surface.
## Why not a skill
The repo already closed a Firecrawl skill PR in favor of ClawHub distribution. That is fine for optional user-installed prompt workflows, but it does not solve:
- deterministic tool availability,
- provider-grade config/credential handling,
- self-hosted endpoint support,
- caching,
- stable typed outputs,
- security review on network behavior.
This belongs as an extension, not a prompt-only skill.
## Success criteria
- Users can install/enable one extension and get reliable Firecrawl search/scrape without touching core defaults.
- Self-hosted Firecrawl works with config/env fallback.
- Extension endpoint fetches use guarded networking.
- No new Firecrawl-specific core onboarding/default behavior.
- Core can later adopt plugin-native `web_search` / `web_fetch` seams without redesigning the extension.
## Recommended implementation order
1. Build `firecrawl_scrape`
2. Build `firecrawl_search`
3. Add docs and examples
4. If desired, generalize `web_search` provider loading so the extension can back `web_search`
5. Only then consider a true `web_fetch` provider seam

View File

@ -1,27 +1,71 @@
---
summary: "Firecrawl fallback for web_fetch (anti-bot + cached extraction)"
summary: "Firecrawl search, scrape, and web_fetch fallback"
read_when:
- You want Firecrawl-backed web extraction
- You need a Firecrawl API key
- You want Firecrawl as a web_search provider
- You want anti-bot extraction for web_fetch
title: "Firecrawl"
---
# Firecrawl
OpenClaw can use **Firecrawl** as a fallback extractor for `web_fetch`. It is a hosted
content extraction service that supports bot circumvention and caching, which helps
with JS-heavy sites or pages that block plain HTTP fetches.
OpenClaw can use **Firecrawl** in three ways:
- as the `web_search` provider
- as explicit plugin tools: `firecrawl_search` and `firecrawl_scrape`
- as a fallback extractor for `web_fetch`
It is a hosted extraction/search service that supports bot circumvention and caching,
which helps with JS-heavy sites or pages that block plain HTTP fetches.
## Get an API key
1. Create a Firecrawl account and generate an API key.
2. Store it in config or set `FIRECRAWL_API_KEY` in the gateway environment.
## Configure Firecrawl
## Configure Firecrawl search
```json5
{
plugins: {
entries: {
firecrawl: {
enabled: true,
},
},
},
tools: {
web: {
search: {
provider: "firecrawl",
firecrawl: {
apiKey: "FIRECRAWL_API_KEY_HERE",
baseUrl: "https://api.firecrawl.dev",
},
},
},
},
}
```
Notes:
- Choosing Firecrawl in onboarding or `openclaw configure --section web` enables the bundled Firecrawl plugin automatically.
- `web_search` with Firecrawl supports `query` and `count`.
- For Firecrawl-specific controls like `sources`, `categories`, or result scraping, use `firecrawl_search`.
## Configure Firecrawl scrape + web_fetch fallback
```json5
{
plugins: {
entries: {
firecrawl: {
enabled: true,
},
},
},
tools: {
web: {
fetch: {
@ -44,6 +88,38 @@ Notes:
- Firecrawl fallback attempts run only when an API key is available (`tools.web.fetch.firecrawl.apiKey` or `FIRECRAWL_API_KEY`).
- `maxAgeMs` controls how old cached results can be (ms). Default is 2 days.
`firecrawl_scrape` reuses the same `tools.web.fetch.firecrawl.*` settings and env vars.
## Firecrawl plugin tools
### `firecrawl_search`
Use this when you want Firecrawl-specific search controls instead of generic `web_search`.
Core parameters:
- `query`
- `count`
- `sources`
- `categories`
- `scrapeResults`
- `timeoutSeconds`
### `firecrawl_scrape`
Use this for JS-heavy or bot-protected pages where plain `web_fetch` is weak.
Core parameters:
- `url`
- `extractMode`
- `maxChars`
- `onlyMainContent`
- `maxAgeMs`
- `proxy`
- `storeInCache`
- `timeoutSeconds`
## Stealth / bot circumvention
Firecrawl exposes a **proxy mode** parameter for bot circumvention (`basic`, `stealth`, or `auto`).

View File

@ -256,7 +256,7 @@ Enable with `tools.loopDetection.enabled: true` (default is `false`).
### `web_search`
Search the web using Perplexity, Brave, Gemini, Grok, or Kimi.
Search the web using Brave, Firecrawl, Gemini, Grok, Kimi, or Perplexity.
Core parameters:

View File

@ -1,5 +1,5 @@
---
summary: "Web search + fetch tools (Brave, Gemini, Grok, Kimi, and Perplexity providers)"
summary: "Web search + fetch tools (Brave, Firecrawl, Gemini, Grok, Kimi, and Perplexity providers)"
read_when:
- You want to enable web_search or web_fetch
- You need provider API key setup
@ -11,7 +11,7 @@ title: "Web Tools"
OpenClaw ships two lightweight web tools:
- `web_search` — Search the web using Brave Search API, Gemini with Google Search grounding, Grok, Kimi, or Perplexity Search API.
- `web_search` — Search the web using Brave Search API, Firecrawl Search, Gemini with Google Search grounding, Grok, Kimi, or Perplexity Search API.
- `web_fetch` — HTTP fetch + readable extraction (HTML → markdown/text).
These are **not** browser automation. For JS-heavy sites or logins, use the
@ -24,18 +24,20 @@ These are **not** browser automation. For JS-heavy sites or logins, use the
- `web_fetch` does a plain HTTP GET and extracts readable content
(HTML → markdown/text). It does **not** execute JavaScript.
- `web_fetch` is enabled by default (unless explicitly disabled).
- The bundled Firecrawl plugin also adds `firecrawl_search` and `firecrawl_scrape` when enabled.
See [Brave Search setup](/brave-search) and [Perplexity Search setup](/perplexity) for provider-specific details.
## Choosing a search provider
| Provider | Result shape | Provider-specific filters | Notes | API key |
| ------------------------- | ---------------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------- |
| **Brave Search API** | Structured results with snippets | `country`, `language`, `ui_lang`, time | Supports Brave `llm-context` mode | `BRAVE_API_KEY` |
| **Gemini** | AI-synthesized answers + citations | — | Uses Google Search grounding | `GEMINI_API_KEY` |
| **Grok** | AI-synthesized answers + citations | — | Uses xAI web-grounded responses | `XAI_API_KEY` |
| **Kimi** | AI-synthesized answers + citations | — | Uses Moonshot web search | `KIMI_API_KEY` / `MOONSHOT_API_KEY` |
| **Perplexity Search API** | Structured results with snippets | `country`, `language`, time, `domain_filter` | Supports content extraction controls; OpenRouter uses Sonar compatibility path | `PERPLEXITY_API_KEY` / `OPENROUTER_API_KEY` |
| Provider | Result shape | Provider-specific filters | Notes | API key |
| ------------------------- | ---------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------------------------ | ------------------------------------------- |
| **Brave Search API** | Structured results with snippets | `country`, `language`, `ui_lang`, time | Supports Brave `llm-context` mode | `BRAVE_API_KEY` |
| **Firecrawl Search** | Structured results with snippets | Use `firecrawl_search` for Firecrawl-specific search options | Best for pairing search with Firecrawl scraping/extraction | `FIRECRAWL_API_KEY` |
| **Gemini** | AI-synthesized answers + citations | — | Uses Google Search grounding | `GEMINI_API_KEY` |
| **Grok** | AI-synthesized answers + citations | — | Uses xAI web-grounded responses | `XAI_API_KEY` |
| **Kimi** | AI-synthesized answers + citations | — | Uses Moonshot web search | `KIMI_API_KEY` / `MOONSHOT_API_KEY` |
| **Perplexity Search API** | Structured results with snippets | `country`, `language`, time, `domain_filter` | Supports content extraction controls; OpenRouter uses Sonar compatibility path | `PERPLEXITY_API_KEY` / `OPENROUTER_API_KEY` |
### Auto-detection
@ -46,6 +48,7 @@ The table above is alphabetical. If no `provider` is explicitly set, runtime aut
3. **Grok**`XAI_API_KEY` env var or `tools.web.search.grok.apiKey` config
4. **Kimi**`KIMI_API_KEY` / `MOONSHOT_API_KEY` env var or `tools.web.search.kimi.apiKey` config
5. **Perplexity**`PERPLEXITY_API_KEY`, `OPENROUTER_API_KEY`, or `tools.web.search.perplexity.apiKey` config
6. **Firecrawl**`FIRECRAWL_API_KEY` env var or `tools.web.search.firecrawl.apiKey` config
If no keys are found, it falls back to Brave (you'll get a missing-key error prompting you to configure one).
@ -86,6 +89,7 @@ See [Perplexity Search API Docs](https://docs.perplexity.ai/guides/search-quicks
**Via config:** run `openclaw configure --section web`. It stores the key under the provider-specific config path:
- Brave: `tools.web.search.apiKey`
- Firecrawl: `tools.web.search.firecrawl.apiKey`
- Gemini: `tools.web.search.gemini.apiKey`
- Grok: `tools.web.search.grok.apiKey`
- Kimi: `tools.web.search.kimi.apiKey`
@ -96,6 +100,7 @@ All of these fields also support SecretRef objects.
**Via environment:** set provider env vars in the Gateway process environment:
- Brave: `BRAVE_API_KEY`
- Firecrawl: `FIRECRAWL_API_KEY`
- Gemini: `GEMINI_API_KEY`
- Grok: `XAI_API_KEY`
- Kimi: `KIMI_API_KEY` or `MOONSHOT_API_KEY`
@ -121,6 +126,34 @@ For a gateway install, put these in `~/.openclaw/.env` (or your service environm
}
```
**Firecrawl Search:**
```json5
{
plugins: {
entries: {
firecrawl: {
enabled: true,
},
},
},
tools: {
web: {
search: {
enabled: true,
provider: "firecrawl",
firecrawl: {
apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
baseUrl: "https://api.firecrawl.dev",
},
},
},
},
}
```
When you choose Firecrawl in onboarding or `openclaw configure --section web`, OpenClaw enables the bundled Firecrawl plugin automatically so `web_search`, `firecrawl_search`, and `firecrawl_scrape` are all available.
**Brave LLM Context mode:**
```json5
@ -234,6 +267,7 @@ Search the web using your configured provider.
- `tools.web.search.enabled` must not be `false` (default: enabled)
- API key for your chosen provider:
- **Brave**: `BRAVE_API_KEY` or `tools.web.search.apiKey`
- **Firecrawl**: `FIRECRAWL_API_KEY` or `tools.web.search.firecrawl.apiKey`
- **Gemini**: `GEMINI_API_KEY` or `tools.web.search.gemini.apiKey`
- **Grok**: `XAI_API_KEY` or `tools.web.search.grok.apiKey`
- **Kimi**: `KIMI_API_KEY`, `MOONSHOT_API_KEY`, or `tools.web.search.kimi.apiKey`
@ -260,7 +294,7 @@ Search the web using your configured provider.
### Tool parameters
All parameters work for Brave and for native Perplexity Search API unless noted.
Parameters depend on the selected provider.
Perplexity's OpenRouter / Sonar compatibility path supports only `query` and `freshness`.
If you set `tools.web.search.perplexity.baseUrl` / `model`, use `OPENROUTER_API_KEY`, or configure an `sk-or-...` key, Search API-only filters return explicit errors.
@ -279,6 +313,8 @@ If you set `tools.web.search.perplexity.baseUrl` / `model`, use `OPENROUTER_API_
| `max_tokens` | Total content budget, default 25000 (Perplexity only) |
| `max_tokens_per_page` | Per-page token limit, default 2048 (Perplexity only) |
Firecrawl `web_search` supports `query` and `count`. For Firecrawl-specific controls like `sources`, `categories`, result scraping, or scrape timeout, use `firecrawl_search` from the bundled Firecrawl plugin.
**Examples:**
```javascript

View File

@ -0,0 +1,100 @@
import { describe, expect, it } from "vitest";
import plugin from "./index.js";
import { __testing as firecrawlClientTesting } from "./src/firecrawl-client.js";
describe("firecrawl plugin", () => {
it("registers a web search provider and tools", () => {
const tools: Array<{ name: string }> = [];
const webSearchProviders: Array<{ id: string }> = [];
plugin.register?.({
config: {},
registerTool(tool: { name: string }) {
tools.push(tool);
},
registerWebSearchProvider(provider: { id: string }) {
webSearchProviders.push(provider);
},
} as never);
expect(webSearchProviders.map((provider) => provider.id)).toEqual(["firecrawl"]);
expect(tools.map((tool) => tool.name)).toEqual(["firecrawl_search", "firecrawl_scrape"]);
});
it("parses scrape payloads into wrapped external-content results", () => {
const result = firecrawlClientTesting.parseFirecrawlScrapePayload({
payload: {
success: true,
data: {
markdown: "# Hello\n\nWorld",
metadata: {
title: "Example page",
sourceURL: "https://example.com/final",
statusCode: 200,
},
},
},
url: "https://example.com/start",
extractMode: "text",
maxChars: 1000,
});
expect(result.finalUrl).toBe("https://example.com/final");
expect(result.status).toBe(200);
expect(result.extractor).toBe("firecrawl");
expect(typeof result.text).toBe("string");
});
it("extracts search items from flexible Firecrawl payload shapes", () => {
const items = firecrawlClientTesting.resolveSearchItems({
success: true,
data: [
{
title: "Docs",
url: "https://docs.example.com/path",
description: "Reference docs",
markdown: "Body",
},
],
});
expect(items).toEqual([
{
title: "Docs",
url: "https://docs.example.com/path",
description: "Reference docs",
content: "Body",
published: undefined,
siteName: "docs.example.com",
},
]);
});
it("extracts search items from Firecrawl v2 data.web payloads", () => {
const items = firecrawlClientTesting.resolveSearchItems({
success: true,
data: {
web: [
{
title: "API Platform - OpenAI",
url: "https://openai.com/api/",
description: "Build on the OpenAI API platform.",
markdown: "# API Platform",
position: 1,
},
],
},
});
expect(items).toEqual([
{
title: "API Platform - OpenAI",
url: "https://openai.com/api/",
description: "Build on the OpenAI API platform.",
content: "# API Platform",
published: undefined,
siteName: "openai.com",
},
]);
});
});

View File

@ -0,0 +1,20 @@
import type { AnyAgentTool } from "../../src/agents/tools/common.js";
import { emptyPluginConfigSchema } from "../../src/plugins/config-schema.js";
import type { OpenClawPluginApi } from "../../src/plugins/types.js";
import { createFirecrawlScrapeTool } from "./src/firecrawl-scrape-tool.js";
import { createFirecrawlWebSearchProvider } from "./src/firecrawl-search-provider.js";
import { createFirecrawlSearchTool } from "./src/firecrawl-search-tool.js";
const firecrawlPlugin = {
id: "firecrawl",
name: "Firecrawl Plugin",
description: "Bundled Firecrawl search and scrape plugin",
configSchema: emptyPluginConfigSchema(),
register(api: OpenClawPluginApi) {
api.registerWebSearchProvider(createFirecrawlWebSearchProvider());
api.registerTool(createFirecrawlSearchTool(api) as AnyAgentTool);
api.registerTool(createFirecrawlScrapeTool(api) as AnyAgentTool);
},
};
export default firecrawlPlugin;

View File

@ -0,0 +1,8 @@
{
"id": "firecrawl",
"configSchema": {
"type": "object",
"additionalProperties": false,
"properties": {}
}
}

View File

@ -0,0 +1,12 @@
{
"name": "@openclaw/firecrawl-plugin",
"version": "2026.3.14",
"private": true,
"description": "OpenClaw Firecrawl plugin",
"type": "module",
"openclaw": {
"extensions": [
"./index.ts"
]
}
}

View File

@ -0,0 +1,159 @@
import type { OpenClawConfig } from "../../../src/config/config.js";
import { normalizeResolvedSecretInputString } from "../../../src/config/types.secrets.js";
import { normalizeSecretInput } from "../../../src/utils/normalize-secret-input.js";
export const DEFAULT_FIRECRAWL_BASE_URL = "https://api.firecrawl.dev";
export const DEFAULT_FIRECRAWL_SEARCH_TIMEOUT_SECONDS = 30;
export const DEFAULT_FIRECRAWL_SCRAPE_TIMEOUT_SECONDS = 60;
export const DEFAULT_FIRECRAWL_MAX_AGE_MS = 172_800_000;
type WebSearchConfig = NonNullable<OpenClawConfig["tools"]>["web"] extends infer Web
? Web extends { search?: infer Search }
? Search
: undefined
: undefined;
type WebFetchConfig = NonNullable<OpenClawConfig["tools"]>["web"] extends infer Web
? Web extends { fetch?: infer Fetch }
? Fetch
: undefined
: undefined;
type FirecrawlSearchConfig =
| {
apiKey?: unknown;
baseUrl?: string;
}
| undefined;
type FirecrawlFetchConfig =
| {
apiKey?: unknown;
baseUrl?: string;
onlyMainContent?: boolean;
maxAgeMs?: number;
timeoutSeconds?: number;
}
| undefined;
function resolveSearchConfig(cfg?: OpenClawConfig): WebSearchConfig {
const search = cfg?.tools?.web?.search;
if (!search || typeof search !== "object") {
return undefined;
}
return search as WebSearchConfig;
}
function resolveFetchConfig(cfg?: OpenClawConfig): WebFetchConfig {
const fetch = cfg?.tools?.web?.fetch;
if (!fetch || typeof fetch !== "object") {
return undefined;
}
return fetch as WebFetchConfig;
}
export function resolveFirecrawlSearchConfig(cfg?: OpenClawConfig): FirecrawlSearchConfig {
const search = resolveSearchConfig(cfg);
if (!search || typeof search !== "object") {
return undefined;
}
const firecrawl = "firecrawl" in search ? search.firecrawl : undefined;
if (!firecrawl || typeof firecrawl !== "object") {
return undefined;
}
return firecrawl as FirecrawlSearchConfig;
}
export function resolveFirecrawlFetchConfig(cfg?: OpenClawConfig): FirecrawlFetchConfig {
const fetch = resolveFetchConfig(cfg);
if (!fetch || typeof fetch !== "object") {
return undefined;
}
const firecrawl = "firecrawl" in fetch ? fetch.firecrawl : undefined;
if (!firecrawl || typeof firecrawl !== "object") {
return undefined;
}
return firecrawl as FirecrawlFetchConfig;
}
function normalizeConfiguredSecret(value: unknown, path: string): string | undefined {
return normalizeSecretInput(
normalizeResolvedSecretInputString({
value,
path,
}),
);
}
export function resolveFirecrawlApiKey(cfg?: OpenClawConfig): string | undefined {
const search = resolveFirecrawlSearchConfig(cfg);
const fetch = resolveFirecrawlFetchConfig(cfg);
return (
normalizeConfiguredSecret(search?.apiKey, "tools.web.search.firecrawl.apiKey") ||
normalizeConfiguredSecret(fetch?.apiKey, "tools.web.fetch.firecrawl.apiKey") ||
normalizeSecretInput(process.env.FIRECRAWL_API_KEY) ||
undefined
);
}
export function resolveFirecrawlBaseUrl(cfg?: OpenClawConfig): string {
const search = resolveFirecrawlSearchConfig(cfg);
const fetch = resolveFirecrawlFetchConfig(cfg);
const configured =
(typeof search?.baseUrl === "string" ? search.baseUrl.trim() : "") ||
(typeof fetch?.baseUrl === "string" ? fetch.baseUrl.trim() : "") ||
normalizeSecretInput(process.env.FIRECRAWL_BASE_URL) ||
"";
return configured || DEFAULT_FIRECRAWL_BASE_URL;
}
export function resolveFirecrawlOnlyMainContent(cfg?: OpenClawConfig, override?: boolean): boolean {
if (typeof override === "boolean") {
return override;
}
const fetch = resolveFirecrawlFetchConfig(cfg);
if (typeof fetch?.onlyMainContent === "boolean") {
return fetch.onlyMainContent;
}
return true;
}
export function resolveFirecrawlMaxAgeMs(cfg?: OpenClawConfig, override?: number): number {
if (typeof override === "number" && Number.isFinite(override) && override >= 0) {
return Math.floor(override);
}
const fetch = resolveFirecrawlFetchConfig(cfg);
if (
typeof fetch?.maxAgeMs === "number" &&
Number.isFinite(fetch.maxAgeMs) &&
fetch.maxAgeMs >= 0
) {
return Math.floor(fetch.maxAgeMs);
}
return DEFAULT_FIRECRAWL_MAX_AGE_MS;
}
export function resolveFirecrawlScrapeTimeoutSeconds(
cfg?: OpenClawConfig,
override?: number,
): number {
if (typeof override === "number" && Number.isFinite(override) && override > 0) {
return Math.floor(override);
}
const fetch = resolveFirecrawlFetchConfig(cfg);
if (
typeof fetch?.timeoutSeconds === "number" &&
Number.isFinite(fetch.timeoutSeconds) &&
fetch.timeoutSeconds > 0
) {
return Math.floor(fetch.timeoutSeconds);
}
return DEFAULT_FIRECRAWL_SCRAPE_TIMEOUT_SECONDS;
}
export function resolveFirecrawlSearchTimeoutSeconds(override?: number): number {
if (typeof override === "number" && Number.isFinite(override) && override > 0) {
return Math.floor(override);
}
return DEFAULT_FIRECRAWL_SEARCH_TIMEOUT_SECONDS;
}

View File

@ -0,0 +1,446 @@
import { markdownToText, truncateText } from "../../../src/agents/tools/web-fetch-utils.js";
import { withTrustedWebToolsEndpoint } from "../../../src/agents/tools/web-guarded-fetch.js";
import {
DEFAULT_CACHE_TTL_MINUTES,
normalizeCacheKey,
readCache,
readResponseText,
resolveCacheTtlMs,
writeCache,
} from "../../../src/agents/tools/web-shared.js";
import type { OpenClawConfig } from "../../../src/config/config.js";
import { wrapExternalContent, wrapWebContent } from "../../../src/security/external-content.js";
import {
resolveFirecrawlApiKey,
resolveFirecrawlBaseUrl,
resolveFirecrawlMaxAgeMs,
resolveFirecrawlOnlyMainContent,
resolveFirecrawlScrapeTimeoutSeconds,
resolveFirecrawlSearchTimeoutSeconds,
} from "./config.js";
const SEARCH_CACHE = new Map<
string,
{ value: Record<string, unknown>; expiresAt: number; insertedAt: number }
>();
const SCRAPE_CACHE = new Map<
string,
{ value: Record<string, unknown>; expiresAt: number; insertedAt: number }
>();
const DEFAULT_SEARCH_COUNT = 5;
const DEFAULT_SCRAPE_MAX_CHARS = 50_000;
const DEFAULT_ERROR_MAX_BYTES = 64_000;
type FirecrawlSearchItem = {
title: string;
url: string;
description?: string;
content?: string;
published?: string;
siteName?: string;
};
export type FirecrawlSearchParams = {
cfg?: OpenClawConfig;
query: string;
count?: number;
timeoutSeconds?: number;
sources?: string[];
categories?: string[];
scrapeResults?: boolean;
};
export type FirecrawlScrapeParams = {
cfg?: OpenClawConfig;
url: string;
extractMode: "markdown" | "text";
maxChars?: number;
onlyMainContent?: boolean;
maxAgeMs?: number;
proxy?: "auto" | "basic" | "stealth";
storeInCache?: boolean;
timeoutSeconds?: number;
};
function resolveEndpoint(baseUrl: string, pathname: "/v2/search" | "/v2/scrape"): string {
const trimmed = baseUrl.trim();
if (!trimmed) {
return new URL(pathname, "https://api.firecrawl.dev").toString();
}
try {
const url = new URL(trimmed);
if (url.pathname && url.pathname !== "/") {
return url.toString();
}
url.pathname = pathname;
return url.toString();
} catch {
return new URL(pathname, "https://api.firecrawl.dev").toString();
}
}
function resolveSiteName(urlRaw: string): string | undefined {
try {
const host = new URL(urlRaw).hostname.replace(/^www\./, "");
return host || undefined;
} catch {
return undefined;
}
}
async function postFirecrawlJson(params: {
baseUrl: string;
pathname: "/v2/search" | "/v2/scrape";
apiKey: string;
body: Record<string, unknown>;
timeoutSeconds: number;
errorLabel: string;
}): Promise<Record<string, unknown>> {
const endpoint = resolveEndpoint(params.baseUrl, params.pathname);
return await withTrustedWebToolsEndpoint(
{
url: endpoint,
timeoutSeconds: params.timeoutSeconds,
init: {
method: "POST",
headers: {
Accept: "application/json",
Authorization: `Bearer ${params.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify(params.body),
},
},
async ({ response }) => {
if (!response.ok) {
const detail = await readResponseText(response, { maxBytes: DEFAULT_ERROR_MAX_BYTES });
throw new Error(
`${params.errorLabel} API error (${response.status}): ${detail.text || response.statusText}`,
);
}
const payload = (await response.json()) as Record<string, unknown>;
if (payload.success === false) {
const error =
typeof payload.error === "string"
? payload.error
: typeof payload.message === "string"
? payload.message
: "unknown error";
throw new Error(`${params.errorLabel} API error: ${error}`);
}
return payload;
},
);
}
function resolveSearchItems(payload: Record<string, unknown>): FirecrawlSearchItem[] {
const candidates = [
payload.data,
payload.results,
(payload.data as { results?: unknown } | undefined)?.results,
(payload.data as { data?: unknown } | undefined)?.data,
(payload.data as { web?: unknown } | undefined)?.web,
(payload.web as { results?: unknown } | undefined)?.results,
];
const rawItems = candidates.find((candidate) => Array.isArray(candidate));
if (!Array.isArray(rawItems)) {
return [];
}
const items: FirecrawlSearchItem[] = [];
for (const entry of rawItems) {
if (!entry || typeof entry !== "object") {
continue;
}
const record = entry as Record<string, unknown>;
const metadata =
record.metadata && typeof record.metadata === "object"
? (record.metadata as Record<string, unknown>)
: undefined;
const url =
(typeof record.url === "string" && record.url) ||
(typeof record.sourceURL === "string" && record.sourceURL) ||
(typeof record.sourceUrl === "string" && record.sourceUrl) ||
(typeof metadata?.sourceURL === "string" && metadata.sourceURL) ||
"";
if (!url) {
continue;
}
const title =
(typeof record.title === "string" && record.title) ||
(typeof metadata?.title === "string" && metadata.title) ||
"";
const description =
(typeof record.description === "string" && record.description) ||
(typeof record.snippet === "string" && record.snippet) ||
(typeof record.summary === "string" && record.summary) ||
undefined;
const content =
(typeof record.markdown === "string" && record.markdown) ||
(typeof record.content === "string" && record.content) ||
(typeof record.text === "string" && record.text) ||
undefined;
const published =
(typeof record.publishedDate === "string" && record.publishedDate) ||
(typeof record.published === "string" && record.published) ||
(typeof metadata?.publishedTime === "string" && metadata.publishedTime) ||
(typeof metadata?.publishedDate === "string" && metadata.publishedDate) ||
undefined;
items.push({
title,
url,
description,
content,
published,
siteName: resolveSiteName(url),
});
}
return items;
}
function buildSearchPayload(params: {
query: string;
provider: "firecrawl";
items: FirecrawlSearchItem[];
tookMs: number;
scrapeResults: boolean;
}): Record<string, unknown> {
return {
query: params.query,
provider: params.provider,
count: params.items.length,
tookMs: params.tookMs,
externalContent: {
untrusted: true,
source: "web_search",
provider: params.provider,
wrapped: true,
},
results: params.items.map((entry) => ({
title: entry.title ? wrapWebContent(entry.title, "web_search") : "",
url: entry.url,
description: entry.description ? wrapWebContent(entry.description, "web_search") : "",
...(entry.published ? { published: entry.published } : {}),
...(entry.siteName ? { siteName: entry.siteName } : {}),
...(params.scrapeResults && entry.content
? { content: wrapWebContent(entry.content, "web_search") }
: {}),
})),
};
}
export async function runFirecrawlSearch(
params: FirecrawlSearchParams,
): Promise<Record<string, unknown>> {
const apiKey = resolveFirecrawlApiKey(params.cfg);
if (!apiKey) {
throw new Error(
"web_search (firecrawl) needs a Firecrawl API key. Set FIRECRAWL_API_KEY in the Gateway environment, or configure tools.web.search.firecrawl.apiKey.",
);
}
const count =
typeof params.count === "number" && Number.isFinite(params.count)
? Math.max(1, Math.min(10, Math.floor(params.count)))
: DEFAULT_SEARCH_COUNT;
const timeoutSeconds = resolveFirecrawlSearchTimeoutSeconds(params.timeoutSeconds);
const scrapeResults = params.scrapeResults === true;
const sources = Array.isArray(params.sources) ? params.sources.filter(Boolean) : [];
const categories = Array.isArray(params.categories) ? params.categories.filter(Boolean) : [];
const baseUrl = resolveFirecrawlBaseUrl(params.cfg);
const cacheKey = normalizeCacheKey(
JSON.stringify({
type: "firecrawl-search",
q: params.query,
count,
baseUrl,
sources,
categories,
scrapeResults,
}),
);
const cached = readCache(SEARCH_CACHE, cacheKey);
if (cached) {
return { ...cached.value, cached: true };
}
const body: Record<string, unknown> = {
query: params.query,
limit: count,
};
if (sources.length > 0) {
body.sources = sources;
}
if (categories.length > 0) {
body.categories = categories;
}
if (scrapeResults) {
body.scrapeOptions = {
formats: ["markdown"],
};
}
const start = Date.now();
const payload = await postFirecrawlJson({
baseUrl,
pathname: "/v2/search",
apiKey,
body,
timeoutSeconds,
errorLabel: "Firecrawl Search",
});
const result = buildSearchPayload({
query: params.query,
provider: "firecrawl",
items: resolveSearchItems(payload),
tookMs: Date.now() - start,
scrapeResults,
});
writeCache(
SEARCH_CACHE,
cacheKey,
result,
resolveCacheTtlMs(undefined, DEFAULT_CACHE_TTL_MINUTES),
);
return result;
}
function resolveScrapeData(payload: Record<string, unknown>): Record<string, unknown> {
const data = payload.data;
if (data && typeof data === "object") {
return data as Record<string, unknown>;
}
return {};
}
export function parseFirecrawlScrapePayload(params: {
payload: Record<string, unknown>;
url: string;
extractMode: "markdown" | "text";
maxChars: number;
}): Record<string, unknown> {
const data = resolveScrapeData(params.payload);
const metadata =
data.metadata && typeof data.metadata === "object"
? (data.metadata as Record<string, unknown>)
: undefined;
const markdown =
(typeof data.markdown === "string" && data.markdown) ||
(typeof data.content === "string" && data.content) ||
"";
if (!markdown) {
throw new Error("Firecrawl scrape returned no content.");
}
const rawText = params.extractMode === "text" ? markdownToText(markdown) : markdown;
const truncated = truncateText(rawText, params.maxChars);
return {
url: params.url,
finalUrl:
(typeof metadata?.sourceURL === "string" && metadata.sourceURL) ||
(typeof data.url === "string" && data.url) ||
params.url,
status:
(typeof metadata?.statusCode === "number" && metadata.statusCode) ||
(typeof data.statusCode === "number" && data.statusCode) ||
undefined,
title:
typeof metadata?.title === "string" && metadata.title
? wrapExternalContent(metadata.title, { source: "web_fetch", includeWarning: false })
: undefined,
extractor: "firecrawl",
extractMode: params.extractMode,
externalContent: {
untrusted: true,
source: "web_fetch",
wrapped: true,
},
truncated: truncated.truncated,
rawLength: rawText.length,
wrappedLength: wrapExternalContent(truncated.text, {
source: "web_fetch",
includeWarning: false,
}).length,
text: wrapExternalContent(truncated.text, {
source: "web_fetch",
includeWarning: false,
}),
warning:
typeof params.payload.warning === "string" && params.payload.warning
? wrapExternalContent(params.payload.warning, {
source: "web_fetch",
includeWarning: false,
})
: undefined,
};
}
export async function runFirecrawlScrape(
params: FirecrawlScrapeParams,
): Promise<Record<string, unknown>> {
const apiKey = resolveFirecrawlApiKey(params.cfg);
if (!apiKey) {
throw new Error(
"firecrawl_scrape needs a Firecrawl API key. Set FIRECRAWL_API_KEY in the Gateway environment, or configure tools.web.fetch.firecrawl.apiKey.",
);
}
const baseUrl = resolveFirecrawlBaseUrl(params.cfg);
const timeoutSeconds = resolveFirecrawlScrapeTimeoutSeconds(params.cfg, params.timeoutSeconds);
const onlyMainContent = resolveFirecrawlOnlyMainContent(params.cfg, params.onlyMainContent);
const maxAgeMs = resolveFirecrawlMaxAgeMs(params.cfg, params.maxAgeMs);
const proxy = params.proxy ?? "auto";
const storeInCache = params.storeInCache ?? true;
const maxChars =
typeof params.maxChars === "number" && Number.isFinite(params.maxChars) && params.maxChars > 0
? Math.floor(params.maxChars)
: DEFAULT_SCRAPE_MAX_CHARS;
const cacheKey = normalizeCacheKey(
JSON.stringify({
type: "firecrawl-scrape",
url: params.url,
extractMode: params.extractMode,
baseUrl,
onlyMainContent,
maxAgeMs,
proxy,
storeInCache,
maxChars,
}),
);
const cached = readCache(SCRAPE_CACHE, cacheKey);
if (cached) {
return { ...cached.value, cached: true };
}
const payload = await postFirecrawlJson({
baseUrl,
pathname: "/v2/scrape",
apiKey,
timeoutSeconds,
errorLabel: "Firecrawl",
body: {
url: params.url,
formats: ["markdown"],
onlyMainContent,
timeout: timeoutSeconds * 1000,
maxAge: maxAgeMs,
proxy,
storeInCache,
},
});
const result = parseFirecrawlScrapePayload({
payload,
url: params.url,
extractMode: params.extractMode,
maxChars,
});
writeCache(
SCRAPE_CACHE,
cacheKey,
result,
resolveCacheTtlMs(undefined, DEFAULT_CACHE_TTL_MINUTES),
);
return result;
}
export const __testing = {
parseFirecrawlScrapePayload,
resolveSearchItems,
};

View File

@ -0,0 +1,89 @@
import { Type } from "@sinclair/typebox";
import { optionalStringEnum } from "../../../src/agents/schema/typebox.js";
import { jsonResult, readNumberParam, readStringParam } from "../../../src/agents/tools/common.js";
import type { OpenClawPluginApi } from "../../../src/plugins/types.js";
import { runFirecrawlScrape } from "./firecrawl-client.js";
const FirecrawlScrapeToolSchema = Type.Object(
{
url: Type.String({ description: "HTTP or HTTPS URL to scrape via Firecrawl." }),
extractMode: optionalStringEnum(["markdown", "text"] as const, {
description: 'Extraction mode ("markdown" or "text"). Default: markdown.',
}),
maxChars: Type.Optional(
Type.Number({
description: "Maximum characters to return.",
minimum: 100,
}),
),
onlyMainContent: Type.Optional(
Type.Boolean({
description: "Keep only main content when Firecrawl supports it.",
}),
),
maxAgeMs: Type.Optional(
Type.Number({
description: "Maximum Firecrawl cache age in milliseconds.",
minimum: 0,
}),
),
proxy: optionalStringEnum(["auto", "basic", "stealth"] as const, {
description: 'Firecrawl proxy mode ("auto", "basic", or "stealth").',
}),
storeInCache: Type.Optional(
Type.Boolean({
description: "Whether Firecrawl should store the scrape in its cache.",
}),
),
timeoutSeconds: Type.Optional(
Type.Number({
description: "Timeout in seconds for the Firecrawl scrape request.",
minimum: 1,
}),
),
},
{ additionalProperties: false },
);
export function createFirecrawlScrapeTool(api: OpenClawPluginApi) {
return {
name: "firecrawl_scrape",
label: "Firecrawl Scrape",
description:
"Scrape a page using Firecrawl v2/scrape. Useful for JS-heavy or bot-protected pages where plain web_fetch is weak.",
parameters: FirecrawlScrapeToolSchema,
execute: async (_toolCallId: string, rawParams: Record<string, unknown>) => {
const url = readStringParam(rawParams, "url", { required: true });
const extractMode =
readStringParam(rawParams, "extractMode") === "text" ? "text" : "markdown";
const maxChars = readNumberParam(rawParams, "maxChars", { integer: true });
const maxAgeMs = readNumberParam(rawParams, "maxAgeMs", { integer: true });
const timeoutSeconds = readNumberParam(rawParams, "timeoutSeconds", {
integer: true,
});
const proxyRaw = readStringParam(rawParams, "proxy");
const proxy =
proxyRaw === "basic" || proxyRaw === "stealth" || proxyRaw === "auto"
? proxyRaw
: undefined;
const onlyMainContent =
typeof rawParams.onlyMainContent === "boolean" ? rawParams.onlyMainContent : undefined;
const storeInCache =
typeof rawParams.storeInCache === "boolean" ? rawParams.storeInCache : undefined;
return jsonResult(
await runFirecrawlScrape({
cfg: api.config,
url,
extractMode,
maxChars,
onlyMainContent,
maxAgeMs,
proxy,
storeInCache,
timeoutSeconds,
}),
);
},
};
}

View File

@ -0,0 +1,63 @@
import { Type } from "@sinclair/typebox";
import type { WebSearchProviderPlugin } from "../../../src/plugins/types.js";
import { runFirecrawlSearch } from "./firecrawl-client.js";
const GenericFirecrawlSearchSchema = Type.Object(
{
query: Type.String({ description: "Search query string." }),
count: Type.Optional(
Type.Number({
description: "Number of results to return (1-10).",
minimum: 1,
maximum: 10,
}),
),
},
{ additionalProperties: false },
);
function getScopedCredentialValue(searchConfig?: Record<string, unknown>): unknown {
const scoped = searchConfig?.firecrawl;
if (!scoped || typeof scoped !== "object" || Array.isArray(scoped)) {
return undefined;
}
return (scoped as Record<string, unknown>).apiKey;
}
function setScopedCredentialValue(
searchConfigTarget: Record<string, unknown>,
value: unknown,
): void {
const scoped = searchConfigTarget.firecrawl;
if (!scoped || typeof scoped !== "object" || Array.isArray(scoped)) {
searchConfigTarget.firecrawl = { apiKey: value };
return;
}
(scoped as Record<string, unknown>).apiKey = value;
}
export function createFirecrawlWebSearchProvider(): WebSearchProviderPlugin {
return {
id: "firecrawl",
label: "Firecrawl Search",
hint: "Structured results with optional result scraping",
envVars: ["FIRECRAWL_API_KEY"],
placeholder: "fc-...",
signupUrl: "https://www.firecrawl.dev/",
docsUrl: "https://docs.openclaw.ai/tools/firecrawl",
autoDetectOrder: 60,
getCredentialValue: getScopedCredentialValue,
setCredentialValue: setScopedCredentialValue,
createTool: (ctx) => ({
description:
"Search the web using Firecrawl. Returns structured results with snippets from Firecrawl Search. Use firecrawl_search for Firecrawl-specific knobs like sources or categories.",
parameters: GenericFirecrawlSearchSchema,
execute: async (args) =>
await runFirecrawlSearch({
cfg: ctx.config,
query: typeof args.query === "string" ? args.query : "",
count: typeof args.count === "number" ? args.count : undefined,
}),
}),
};
}

View File

@ -0,0 +1,76 @@
import { Type } from "@sinclair/typebox";
import {
jsonResult,
readNumberParam,
readStringArrayParam,
readStringParam,
} from "../../../src/agents/tools/common.js";
import type { OpenClawPluginApi } from "../../../src/plugins/types.js";
import { runFirecrawlSearch } from "./firecrawl-client.js";
const FirecrawlSearchToolSchema = Type.Object(
{
query: Type.String({ description: "Search query string." }),
count: Type.Optional(
Type.Number({
description: "Number of results to return (1-10).",
minimum: 1,
maximum: 10,
}),
),
sources: Type.Optional(
Type.Array(Type.String(), {
description: 'Optional sources list, for example ["web"], ["news"], or ["images"].',
}),
),
categories: Type.Optional(
Type.Array(Type.String(), {
description: 'Optional Firecrawl categories, for example ["github"] or ["research"].',
}),
),
scrapeResults: Type.Optional(
Type.Boolean({
description: "Include scraped result content when Firecrawl returns it.",
}),
),
timeoutSeconds: Type.Optional(
Type.Number({
description: "Timeout in seconds for the Firecrawl Search request.",
minimum: 1,
}),
),
},
{ additionalProperties: false },
);
export function createFirecrawlSearchTool(api: OpenClawPluginApi) {
return {
name: "firecrawl_search",
label: "Firecrawl Search",
description:
"Search the web using Firecrawl v2/search. Can optionally include scraped content from result pages.",
parameters: FirecrawlSearchToolSchema,
execute: async (_toolCallId: string, rawParams: Record<string, unknown>) => {
const query = readStringParam(rawParams, "query", { required: true });
const count = readNumberParam(rawParams, "count", { integer: true });
const timeoutSeconds = readNumberParam(rawParams, "timeoutSeconds", {
integer: true,
});
const sources = readStringArrayParam(rawParams, "sources");
const categories = readStringArrayParam(rawParams, "categories");
const scrapeResults = rawParams.scrapeResults === true;
return jsonResult(
await runFirecrawlSearch({
cfg: api.config,
query,
count,
timeoutSeconds,
sources,
categories,
scrapeResults,
}),
);
},
};
}

View File

@ -206,27 +206,33 @@ function exceedsEstimatedHtmlNestingDepth(html: string, maxDepth: number): boole
return false;
}
export async function extractBasicHtmlContent(params: {
html: string;
extractMode: ExtractMode;
}): Promise<{ text: string; title?: string } | null> {
const cleanHtml = await sanitizeHtml(params.html);
const rendered = htmlToMarkdown(cleanHtml);
if (params.extractMode === "text") {
const text =
stripInvisibleUnicode(markdownToText(rendered.text)) ||
stripInvisibleUnicode(normalizeWhitespace(stripTags(cleanHtml)));
return text ? { text, title: rendered.title } : null;
}
const text = stripInvisibleUnicode(rendered.text);
return text ? { text, title: rendered.title } : null;
}
export async function extractReadableContent(params: {
html: string;
url: string;
extractMode: ExtractMode;
}): Promise<{ text: string; title?: string } | null> {
const cleanHtml = await sanitizeHtml(params.html);
const fallback = (): { text: string; title?: string } => {
const rendered = htmlToMarkdown(cleanHtml);
if (params.extractMode === "text") {
const text =
stripInvisibleUnicode(markdownToText(rendered.text)) ||
stripInvisibleUnicode(normalizeWhitespace(stripTags(cleanHtml)));
return { text, title: rendered.title };
}
return { text: stripInvisibleUnicode(rendered.text), title: rendered.title };
};
if (
cleanHtml.length > READABILITY_MAX_HTML_CHARS ||
exceedsEstimatedHtmlNestingDepth(cleanHtml, READABILITY_MAX_ESTIMATED_NESTING_DEPTH)
) {
return fallback();
return null;
}
try {
const { Readability, parseHTML } = await loadReadabilityDeps();
@ -239,16 +245,17 @@ export async function extractReadableContent(params: {
const reader = new Readability(document, { charThreshold: 0 });
const parsed = reader.parse();
if (!parsed?.content) {
return fallback();
return null;
}
const title = parsed.title || undefined;
if (params.extractMode === "text") {
const text = stripInvisibleUnicode(normalizeWhitespace(parsed.textContent ?? ""));
return text ? { text, title } : fallback();
return text ? { text, title } : null;
}
const rendered = htmlToMarkdown(parsed.content);
return { text: stripInvisibleUnicode(rendered.text), title: title ?? rendered.title };
const text = stripInvisibleUnicode(rendered.text);
return text ? { text, title: title ?? rendered.title } : null;
} catch {
return fallback();
return null;
}
}

View File

@ -10,13 +10,14 @@ import { stringEnum } from "../schema/typebox.js";
import type { AnyAgentTool } from "./common.js";
import { jsonResult, readNumberParam, readStringParam } from "./common.js";
import {
extractBasicHtmlContent,
extractReadableContent,
htmlToMarkdown,
markdownToText,
truncateText,
type ExtractMode,
} from "./web-fetch-utils.js";
import { fetchWithWebToolsNetworkGuard } from "./web-guarded-fetch.js";
import { fetchWithWebToolsNetworkGuard, withTrustedWebToolsEndpoint } from "./web-guarded-fetch.js";
import {
CacheEntry,
DEFAULT_CACHE_TTL_MINUTES,
@ -26,7 +27,6 @@ import {
readResponseText,
resolveCacheTtlMs,
resolveTimeoutSeconds,
withTimeout,
writeCache,
} from "./web-shared.js";
@ -161,11 +161,12 @@ function resolveFirecrawlEnabled(params: {
}
function resolveFirecrawlBaseUrl(firecrawl?: FirecrawlFetchConfig): string {
const raw =
const fromConfig =
firecrawl && "baseUrl" in firecrawl && typeof firecrawl.baseUrl === "string"
? firecrawl.baseUrl.trim()
: "";
return raw || DEFAULT_FIRECRAWL_BASE_URL;
const fromEnv = normalizeSecretInput(process.env.FIRECRAWL_BASE_URL);
return fromConfig || fromEnv || DEFAULT_FIRECRAWL_BASE_URL;
}
function resolveFirecrawlOnlyMainContent(firecrawl?: FirecrawlFetchConfig): boolean {
@ -381,54 +382,59 @@ export async function fetchFirecrawlContent(params: {
proxy: params.proxy,
storeInCache: params.storeInCache,
};
const res = await fetch(endpoint, {
method: "POST",
headers: {
Authorization: `Bearer ${params.apiKey}`,
"Content-Type": "application/json",
return await withTrustedWebToolsEndpoint(
{
url: endpoint,
timeoutSeconds: params.timeoutSeconds,
init: {
method: "POST",
headers: {
Authorization: `Bearer ${params.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify(body),
},
},
body: JSON.stringify(body),
signal: withTimeout(undefined, params.timeoutSeconds * 1000),
});
const payload = (await res.json()) as {
success?: boolean;
data?: {
markdown?: string;
content?: string;
metadata?: {
title?: string;
sourceURL?: string;
statusCode?: number;
async ({ response }) => {
const payload = (await response.json()) as {
success?: boolean;
data?: {
markdown?: string;
content?: string;
metadata?: {
title?: string;
sourceURL?: string;
statusCode?: number;
};
};
warning?: string;
error?: string;
};
};
warning?: string;
error?: string;
};
if (!res.ok || payload?.success === false) {
const detail = payload?.error ?? "";
throw new Error(
`Firecrawl fetch failed (${res.status}): ${wrapWebContent(detail || res.statusText, "web_fetch")}`.trim(),
);
}
if (!response.ok || payload?.success === false) {
const detail = payload?.error ?? "";
throw new Error(
`Firecrawl fetch failed (${response.status}): ${wrapWebContent(detail || response.statusText, "web_fetch")}`.trim(),
);
}
const data = payload?.data ?? {};
const rawText =
typeof data.markdown === "string"
? data.markdown
: typeof data.content === "string"
? data.content
: "";
const text = params.extractMode === "text" ? markdownToText(rawText) : rawText;
return {
text,
title: data.metadata?.title,
finalUrl: data.metadata?.sourceURL,
status: data.metadata?.statusCode,
warning: payload?.warning,
};
const data = payload?.data ?? {};
const rawText =
typeof data.markdown === "string"
? data.markdown
: typeof data.content === "string"
? data.content
: "";
const text = params.extractMode === "text" ? markdownToText(rawText) : rawText;
return {
text,
title: data.metadata?.title,
finalUrl: data.metadata?.sourceURL,
status: data.metadata?.statusCode,
warning: payload?.warning,
};
},
);
}
type FirecrawlRuntimeParams = {
@ -629,9 +635,19 @@ async function runWebFetch(params: WebFetchRuntimeParams): Promise<Record<string
title = firecrawl.title;
extractor = "firecrawl";
} else {
throw new Error(
"Web fetch extraction failed: Readability and Firecrawl returned no content.",
);
const basic = await extractBasicHtmlContent({
html: body,
extractMode: params.extractMode,
});
if (basic?.text) {
text = basic.text;
title = basic.title;
extractor = "raw-html";
} else {
throw new Error(
"Web fetch extraction failed: Readability, Firecrawl, and basic HTML cleanup returned no content.",
);
}
}
}
} else {
@ -784,3 +800,7 @@ export function createWebFetchTool(options?: {
},
};
}
export const __testing = {
resolveFirecrawlBaseUrl,
};

View File

@ -3,6 +3,7 @@ import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import * as ssrf from "../../infra/net/ssrf.js";
import { resolveRequestUrl } from "../../plugin-sdk/request-url.js";
import { withFetchPreconnect } from "../../test-utils/fetch-mock.js";
import { __testing as webFetchTesting } from "./web-fetch.js";
import { makeFetchHeaders } from "./web-fetch.test-harness.js";
import { createWebFetchTool } from "./web-tools.js";
@ -324,6 +325,40 @@ describe("web_fetch extraction fallbacks", () => {
expect(authHeader).toBe("Bearer firecrawl-test-key");
});
it("uses FIRECRAWL_BASE_URL env var when firecrawl.baseUrl is unset", async () => {
vi.stubEnv("FIRECRAWL_BASE_URL", "https://fc.example.com");
expect(webFetchTesting.resolveFirecrawlBaseUrl({})).toBe("https://fc.example.com");
});
it("uses guarded endpoint fetch for firecrawl requests", async () => {
vi.stubEnv("HTTP_PROXY", "http://127.0.0.1:7890");
const fetchSpy = installMockFetch((input: RequestInfo | URL) => {
const url = resolveRequestUrl(input);
if (url.includes("api.firecrawl.dev/v2/scrape")) {
return Promise.resolve(
firecrawlResponse("firecrawl guarded transport"),
) as Promise<Response>;
}
return Promise.resolve(
htmlResponse("<!doctype html><html><head></head><body></body></html>", url),
) as Promise<Response>;
});
const tool = createFirecrawlTool();
const result = await executeFetch(tool, { url: "https://example.com/guarded-firecrawl" });
expect(result?.details).toMatchObject({ extractor: "firecrawl" });
const firecrawlCall = fetchSpy.mock.calls.find((call) =>
resolveRequestUrl(call[0]).includes("/v2/scrape"),
);
expect(firecrawlCall).toBeTruthy();
const requestInit = firecrawlCall?.[1] as (RequestInit & { dispatcher?: unknown }) | undefined;
expect(requestInit?.dispatcher).toBeDefined();
expect(requestInit?.dispatcher).toBeInstanceOf(EnvHttpProxyAgent);
});
it("throws when readability is disabled and firecrawl is unavailable", async () => {
installMockFetch(
(input: RequestInfo | URL) =>
@ -356,7 +391,29 @@ describe("web_fetch extraction fallbacks", () => {
const tool = createFirecrawlTool();
await expect(
executeFetch(tool, { url: "https://example.com/readability-empty" }),
).rejects.toThrow("Readability and Firecrawl returned no content");
).rejects.toThrow("Readability, Firecrawl, and basic HTML cleanup returned no content");
});
it("falls back to basic HTML cleanup after readability and before giving up", async () => {
installMockFetch(
(input: RequestInfo | URL) =>
Promise.resolve(
htmlResponse(
"<!doctype html><html><head><title>Shell App</title></head><body><div id='app'></div></body></html>",
resolveRequestUrl(input),
),
) as Promise<Response>,
);
const tool = createFetchTool({
firecrawl: { enabled: false },
});
const result = await executeFetch(tool, { url: "https://example.com/shell" });
const details = result?.details as { extractor?: string; text?: string; title?: string };
expect(details.extractor).toBe("raw-html");
expect(details.text).toContain("Shell App");
expect(details.title).toContain("Shell App");
});
it("uses firecrawl when direct fetch fails", async () => {

View File

@ -116,6 +116,19 @@ describe("setupSearch", () => {
expect(result.tools?.web?.search?.gemini?.apiKey).toBe("AIza-test");
});
it("sets provider and key for firecrawl and enables the plugin", async () => {
const cfg: OpenClawConfig = {};
const { prompter } = createPrompter({
selectValue: "firecrawl",
textValue: "fc-test-key",
});
const result = await setupSearch(cfg, runtime, prompter);
expect(result.tools?.web?.search?.provider).toBe("firecrawl");
expect(result.tools?.web?.search?.enabled).toBe(true);
expect(result.tools?.web?.search?.firecrawl?.apiKey).toBe("fc-test-key");
expect(result.plugins?.entries?.firecrawl?.enabled).toBe(true);
});
it("sets provider and key for grok", async () => {
const cfg: OpenClawConfig = {};
const { prompter } = createPrompter({
@ -331,9 +344,9 @@ describe("setupSearch", () => {
expect(result.tools?.web?.search?.apiKey).toBe("BSA-plain");
});
it("exports all 5 providers in SEARCH_PROVIDER_OPTIONS", () => {
expect(SEARCH_PROVIDER_OPTIONS).toHaveLength(5);
it("exports all 6 providers in SEARCH_PROVIDER_OPTIONS", () => {
expect(SEARCH_PROVIDER_OPTIONS).toHaveLength(6);
const values = SEARCH_PROVIDER_OPTIONS.map((e) => e.value);
expect(values).toEqual(["brave", "gemini", "grok", "kimi", "perplexity"]);
expect(values).toEqual(["brave", "gemini", "grok", "kimi", "perplexity", "firecrawl"]);
});
});

View File

@ -6,6 +6,7 @@ import {
hasConfiguredSecretInput,
normalizeSecretInputString,
} from "../config/types.secrets.js";
import { enablePluginInConfig } from "../plugins/enable.js";
import { resolvePluginWebSearchProviders } from "../plugins/web-search-providers.js";
import type { RuntimeEnv } from "../runtime.js";
import type { WizardPrompter } from "../wizard/prompts.js";
@ -15,7 +16,7 @@ export type SearchProvider = NonNullable<
NonNullable<NonNullable<NonNullable<OpenClawConfig["tools"]>["web"]>["search"]>["provider"]
>;
const SEARCH_PROVIDER_IDS = ["brave", "gemini", "grok", "kimi", "perplexity"] as const;
const SEARCH_PROVIDER_IDS = ["brave", "firecrawl", "gemini", "grok", "kimi", "perplexity"] as const;
function isSearchProvider(value: string): value is SearchProvider {
return (SEARCH_PROVIDER_IDS as readonly string[]).includes(value);
@ -114,17 +115,21 @@ export function applySearchKey(
if (entry) {
entry.setCredentialValue(search as Record<string, unknown>, key);
}
return {
const next = {
...config,
tools: {
...config.tools,
web: { ...config.tools?.web, search },
},
};
if (provider !== "firecrawl") {
return next;
}
return enablePluginInConfig(next, "firecrawl").config;
}
function applyProviderOnly(config: OpenClawConfig, provider: SearchProvider): OpenClawConfig {
return {
const next = {
...config,
tools: {
...config.tools,
@ -138,6 +143,10 @@ function applyProviderOnly(config: OpenClawConfig, provider: SearchProvider): Op
},
},
};
if (provider !== "firecrawl") {
return next;
}
return enablePluginInConfig(next, "firecrawl").config;
}
function preserveDisabledState(original: OpenClawConfig, result: OpenClawConfig): OpenClawConfig {

View File

@ -16,6 +16,11 @@ vi.mock("../plugins/web-search-providers.js", () => {
envVars: ["BRAVE_API_KEY"],
getCredentialValue: (search?: Record<string, unknown>) => search?.apiKey,
},
{
id: "firecrawl",
envVars: ["FIRECRAWL_API_KEY"],
getCredentialValue: getScoped("firecrawl"),
},
{
id: "gemini",
envVars: ["GEMINI_API_KEY"],
@ -75,6 +80,21 @@ describe("web search provider config", () => {
expect(res.ok).toBe(true);
});
it("accepts firecrawl provider and config", () => {
const res = validateConfigObject(
buildWebSearchProviderConfig({
enabled: true,
provider: "firecrawl",
providerConfig: {
apiKey: "fc-test-key", // pragma: allowlist secret
baseUrl: "https://api.firecrawl.dev",
},
}),
);
expect(res.ok).toBe(true);
});
it("accepts gemini provider with no extra config", () => {
const res = validateConfigObject(
buildWebSearchProviderConfig({
@ -117,6 +137,7 @@ describe("web search provider auto-detection", () => {
beforeEach(() => {
delete process.env.BRAVE_API_KEY;
delete process.env.FIRECRAWL_API_KEY;
delete process.env.GEMINI_API_KEY;
delete process.env.KIMI_API_KEY;
delete process.env.MOONSHOT_API_KEY;
@ -146,6 +167,11 @@ describe("web search provider auto-detection", () => {
expect(resolveSearchProvider({})).toBe("gemini");
});
it("auto-detects firecrawl when only FIRECRAWL_API_KEY is set", () => {
process.env.FIRECRAWL_API_KEY = "fc-test-key"; // pragma: allowlist secret
expect(resolveSearchProvider({})).toBe("firecrawl");
});
it("auto-detects kimi when only KIMI_API_KEY is set", () => {
process.env.KIMI_API_KEY = "test-kimi-key"; // pragma: allowlist secret
expect(resolveSearchProvider({})).toBe("kimi");

View File

@ -665,13 +665,17 @@ export const FIELD_HELP: Record<string, string> = {
"tools.message.broadcast.enabled": "Enable broadcast action (default: true).",
"tools.web.search.enabled": "Enable the web_search tool (requires a provider API key).",
"tools.web.search.provider":
'Search provider ("brave", "gemini", "grok", "kimi", or "perplexity"). Auto-detected from available API keys if omitted.',
'Search provider ("brave", "firecrawl", "gemini", "grok", "kimi", or "perplexity"). Auto-detected from available API keys if omitted.',
"tools.web.search.apiKey": "Brave Search API key (fallback: BRAVE_API_KEY env var).",
"tools.web.search.maxResults": "Number of results to return (1-10).",
"tools.web.search.timeoutSeconds": "Timeout in seconds for web_search requests.",
"tools.web.search.cacheTtlMinutes": "Cache TTL in minutes for web_search results.",
"tools.web.search.brave.mode":
'Brave Search mode: "web" (URL results) or "llm-context" (pre-extracted page content for LLM grounding).',
"tools.web.search.firecrawl.apiKey":
"Firecrawl API key for web search (fallback: FIRECRAWL_API_KEY env var).",
"tools.web.search.firecrawl.baseUrl":
'Firecrawl Search base URL override (default: "https://api.firecrawl.dev").',
"tools.web.search.gemini.apiKey":
"Gemini API key for Google Search grounding (fallback: GEMINI_API_KEY env var).",
"tools.web.search.gemini.model": 'Gemini model override (default: "gemini-2.5-flash").',

View File

@ -221,6 +221,8 @@ export const FIELD_LABELS: Record<string, string> = {
"tools.web.search.timeoutSeconds": "Web Search Timeout (sec)",
"tools.web.search.cacheTtlMinutes": "Web Search Cache TTL (min)",
"tools.web.search.brave.mode": "Brave Search Mode",
"tools.web.search.firecrawl.apiKey": "Firecrawl Search API Key", // pragma: allowlist secret
"tools.web.search.firecrawl.baseUrl": "Firecrawl Search Base URL",
"tools.web.search.gemini.apiKey": "Gemini Search API Key", // pragma: allowlist secret
"tools.web.search.gemini.model": "Gemini Search Model",
"tools.web.search.grok.apiKey": "Grok Search API Key", // pragma: allowlist secret

View File

@ -457,8 +457,8 @@ export type ToolsConfig = {
search?: {
/** Enable web search tool (default: true when API key is present). */
enabled?: boolean;
/** Search provider ("brave", "gemini", "grok", "kimi", or "perplexity"). */
provider?: "brave" | "gemini" | "grok" | "kimi" | "perplexity";
/** Search provider ("brave", "firecrawl", "gemini", "grok", "kimi", or "perplexity"). */
provider?: "brave" | "firecrawl" | "gemini" | "grok" | "kimi" | "perplexity";
/** Brave Search API key (optional; defaults to BRAVE_API_KEY env var). */
apiKey?: SecretInput;
/** Default search results count (1-10). */
@ -479,6 +479,13 @@ export type ToolsConfig = {
/** Model to use for grounded search (defaults to "gemini-2.5-flash"). */
model?: string;
};
/** Firecrawl-specific configuration (used when provider="firecrawl"). */
firecrawl?: {
/** Firecrawl API key (defaults to FIRECRAWL_API_KEY env var). */
apiKey?: SecretInput;
/** Base URL for API requests (defaults to "https://api.firecrawl.dev"). */
baseUrl?: string;
};
/** Grok-specific configuration (used when provider="grok"). */
grok?: {
/** API key for xAI (defaults to XAI_API_KEY env var). */

View File

@ -266,6 +266,7 @@ export const ToolsWebSearchSchema = z
provider: z
.union([
z.literal("brave"),
z.literal("firecrawl"),
z.literal("perplexity"),
z.literal("grok"),
z.literal("gemini"),
@ -301,6 +302,13 @@ export const ToolsWebSearchSchema = z
})
.strict()
.optional(),
firecrawl: z
.object({
apiKey: SecretInputSchema.optional().register(sensitive),
baseUrl: z.string().optional(),
})
.strict()
.optional(),
kimi: z
.object({
apiKey: SecretInputSchema.optional().register(sensitive),

View File

@ -96,6 +96,7 @@ describe("resolvePluginWebSearchProviders", () => {
entries: expect.objectContaining({
openrouter: { enabled: true },
brave: { enabled: true },
firecrawl: { enabled: true },
google: { enabled: true },
moonshot: { enabled: true },
perplexity: { enabled: true },

View File

@ -11,6 +11,7 @@ const log = createSubsystemLogger("plugins");
const BUNDLED_WEB_SEARCH_ALLOWLIST_COMPAT_PLUGIN_IDS = [
"brave",
"firecrawl",
"google",
"moonshot",
"perplexity",