feat: add firecrawl onboarding search plugin
This commit is contained in:
parent
be8fef3840
commit
aa28d1c711
@ -11,6 +11,7 @@ Docs: https://docs.openclaw.ai
|
||||
- Gateway/health monitor: add configurable stale-event thresholds and restart limits, plus per-channel and per-account `healthMonitor.enabled` overrides, while keeping the existing global disable path on `gateway.channelHealthCheckMinutes=0`. (#42107) Thanks @rstar327.
|
||||
- Feishu/cards: add identity-aware structured card headers and note footers for Feishu replies and direct sends, while keeping that presentation wired through the shared outbound identity path. (#29938) Thanks @nszhsl.
|
||||
- Feishu/streaming: add `onReasoningStream` and `onReasoningEnd` support to streaming cards, so `/reasoning stream` renders thinking tokens as markdown blockquotes in the same card — matching the Telegram channel's reasoning lane behavior. (#46029)
|
||||
- Web tools/Firecrawl: add Firecrawl as an `onboard`/configure search provider via a bundled plugin, expose explicit `firecrawl_search` and `firecrawl_scrape` tools, and align core `web_fetch` fallback behavior with Firecrawl base-URL/env fallback plus guarded endpoint fetches.
|
||||
- Refactor/channels: remove the legacy channel shim directories and point channel-specific imports directly at the extension-owned implementations. (#45967) thanks @scoootscooob.
|
||||
- Android/nodes: add `callLog.search` plus shared Call Log permission wiring so Android nodes can search recent call history through the gateway. (#44073) Thanks @lxk7280.
|
||||
- Docs/Zalo: clarify the Marketplace-bot support matrix and config guidance so the Zalo channel docs match current Bot Creator behavior more closely. (#47552) Thanks @No898.
|
||||
|
||||
260
docs/refactor/firecrawl-extension.md
Normal file
260
docs/refactor/firecrawl-extension.md
Normal file
@ -0,0 +1,260 @@
|
||||
---
|
||||
summary: "Design for an opt-in Firecrawl extension that adds search/scrape value without hardwiring Firecrawl into core defaults"
|
||||
read_when:
|
||||
- Designing Firecrawl integration work
|
||||
- Evaluating web_search/web_fetch plugin seams
|
||||
- Deciding whether Firecrawl belongs in core or as an extension
|
||||
title: "Firecrawl Extension Design"
|
||||
---
|
||||
|
||||
# Firecrawl Extension Design
|
||||
|
||||
## Goal
|
||||
|
||||
Ship Firecrawl as an **opt-in extension** that adds:
|
||||
|
||||
- explicit Firecrawl tools for agents,
|
||||
- optional Firecrawl-backed `web_search` integration,
|
||||
- self-hosted support,
|
||||
- stronger security defaults than the current core fallback path,
|
||||
|
||||
without pushing Firecrawl into the default setup/onboarding path.
|
||||
|
||||
## Why this shape
|
||||
|
||||
Recent Firecrawl issues/PRs cluster into three buckets:
|
||||
|
||||
1. **Release/schema drift**
|
||||
- Several releases rejected `tools.web.fetch.firecrawl` even though docs and runtime code supported it.
|
||||
2. **Security hardening**
|
||||
- Current `fetchFirecrawlContent()` still posts to the Firecrawl endpoint with raw `fetch()`, while the main web-fetch path uses the SSRF guard.
|
||||
3. **Product pressure**
|
||||
- Users want Firecrawl-native search/scrape flows, especially for self-hosted/private setups.
|
||||
- Maintainers explicitly rejected wiring Firecrawl deeply into core defaults, setup flow, and browser behavior.
|
||||
|
||||
That combination argues for an extension, not more Firecrawl-specific logic in the default core path.
|
||||
|
||||
## Design principles
|
||||
|
||||
- **Opt-in, vendor-scoped**: no auto-enable, no setup hijack, no default tool-profile widening.
|
||||
- **Extension owns Firecrawl-specific config**: prefer plugin config over growing `tools.web.*` again.
|
||||
- **Useful on day one**: works even if core `web_search` / `web_fetch` seams stay unchanged.
|
||||
- **Security-first**: endpoint fetches use the same guarded networking posture as other web tools.
|
||||
- **Self-hosted-friendly**: config + env fallback, explicit base URL, no hosted-only assumptions.
|
||||
|
||||
## Proposed extension
|
||||
|
||||
Plugin id: `firecrawl`
|
||||
|
||||
### MVP capabilities
|
||||
|
||||
Register explicit tools:
|
||||
|
||||
- `firecrawl_search`
|
||||
- `firecrawl_scrape`
|
||||
|
||||
Optional later:
|
||||
|
||||
- `firecrawl_crawl`
|
||||
- `firecrawl_map`
|
||||
|
||||
Do **not** add Firecrawl browser automation in the first version. That was the part of PR #32543 that pulled Firecrawl too far into core behavior and raised the most maintainership concern.
|
||||
|
||||
## Config shape
|
||||
|
||||
Use plugin-scoped config:
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
firecrawl: {
|
||||
enabled: true,
|
||||
config: {
|
||||
apiKey: "FIRECRAWL_API_KEY",
|
||||
baseUrl: "https://api.firecrawl.dev",
|
||||
timeoutSeconds: 60,
|
||||
maxAgeMs: 172800000,
|
||||
proxy: "auto",
|
||||
storeInCache: true,
|
||||
onlyMainContent: true,
|
||||
search: {
|
||||
enabled: true,
|
||||
defaultLimit: 5,
|
||||
sources: ["web"],
|
||||
categories: [],
|
||||
scrapeResults: false,
|
||||
},
|
||||
scrape: {
|
||||
formats: ["markdown"],
|
||||
fallbackForWebFetchLikeUse: false,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Credential resolution
|
||||
|
||||
Precedence:
|
||||
|
||||
1. `plugins.entries.firecrawl.config.apiKey`
|
||||
2. `FIRECRAWL_API_KEY`
|
||||
|
||||
Base URL precedence:
|
||||
|
||||
1. `plugins.entries.firecrawl.config.baseUrl`
|
||||
2. `FIRECRAWL_BASE_URL`
|
||||
3. `https://api.firecrawl.dev`
|
||||
|
||||
### Compatibility bridge
|
||||
|
||||
For the first release, the extension may also **read** existing core config at `tools.web.fetch.firecrawl.*` as a fallback source so existing users do not need to migrate immediately.
|
||||
|
||||
Write path stays plugin-local. Do not keep expanding core Firecrawl config surfaces.
|
||||
|
||||
## Tool design
|
||||
|
||||
### `firecrawl_search`
|
||||
|
||||
Inputs:
|
||||
|
||||
- `query`
|
||||
- `limit`
|
||||
- `sources`
|
||||
- `categories`
|
||||
- `scrapeResults`
|
||||
- `timeoutSeconds`
|
||||
|
||||
Behavior:
|
||||
|
||||
- Calls Firecrawl `v2/search`
|
||||
- Returns normalized OpenClaw-friendly result objects:
|
||||
- `title`
|
||||
- `url`
|
||||
- `snippet`
|
||||
- `source`
|
||||
- optional `content`
|
||||
- Wraps result content as untrusted external content
|
||||
- Cache key includes query + relevant provider params
|
||||
|
||||
Why explicit tool first:
|
||||
|
||||
- Works today without changing `tools.web.search.provider`
|
||||
- Avoids current schema/loader constraints
|
||||
- Gives users Firecrawl value immediately
|
||||
|
||||
### `firecrawl_scrape`
|
||||
|
||||
Inputs:
|
||||
|
||||
- `url`
|
||||
- `formats`
|
||||
- `onlyMainContent`
|
||||
- `maxAgeMs`
|
||||
- `proxy`
|
||||
- `storeInCache`
|
||||
- `timeoutSeconds`
|
||||
|
||||
Behavior:
|
||||
|
||||
- Calls Firecrawl `v2/scrape`
|
||||
- Returns markdown/text plus metadata:
|
||||
- `title`
|
||||
- `finalUrl`
|
||||
- `status`
|
||||
- `warning`
|
||||
- Wraps extracted content the same way `web_fetch` does
|
||||
- Shares cache semantics with web tool expectations where practical
|
||||
|
||||
Why explicit scrape tool:
|
||||
|
||||
- Sidesteps the unresolved `Readability -> Firecrawl -> basic HTML cleanup` ordering bug in core `web_fetch`
|
||||
- Gives users a deterministic “always use Firecrawl” path for JS-heavy/bot-protected sites
|
||||
|
||||
## What the extension should not do
|
||||
|
||||
- No auto-adding `browser`, `web_search`, or `web_fetch` to `tools.alsoAllow`
|
||||
- No default onboarding step in `openclaw setup`
|
||||
- No Firecrawl-specific browser session lifecycle in core
|
||||
- No change to built-in `web_fetch` fallback semantics in the extension MVP
|
||||
|
||||
## Phase plan
|
||||
|
||||
### Phase 1: extension-only, no core schema changes
|
||||
|
||||
Implement:
|
||||
|
||||
- `extensions/firecrawl/`
|
||||
- plugin config schema
|
||||
- `firecrawl_search`
|
||||
- `firecrawl_scrape`
|
||||
- tests for config resolution, endpoint selection, caching, error handling, and SSRF guard usage
|
||||
|
||||
This phase is enough to ship real user value.
|
||||
|
||||
### Phase 2: optional `web_search` provider integration
|
||||
|
||||
Support `tools.web.search.provider = "firecrawl"` only after fixing two core constraints:
|
||||
|
||||
1. `src/plugins/web-search-providers.ts` must load configured/installed web-search-provider plugins instead of a hardcoded bundled list.
|
||||
2. `src/config/types.tools.ts` and `src/config/zod-schema.agent-runtime.ts` must stop hardcoding the provider enum in a way that blocks plugin-registered ids.
|
||||
|
||||
Recommended shape:
|
||||
|
||||
- keep built-in providers documented,
|
||||
- allow any registered plugin provider id at runtime,
|
||||
- validate provider-specific config via the provider plugin or a generic provider bag.
|
||||
|
||||
### Phase 3: optional `web_fetch` provider seam
|
||||
|
||||
Do this only if maintainers want vendor-specific fetch backends to participate in `web_fetch`.
|
||||
|
||||
Needed core addition:
|
||||
|
||||
- `registerWebFetchProvider` or equivalent fetch-backend seam
|
||||
|
||||
Without that seam, the extension should keep `firecrawl_scrape` as an explicit tool rather than trying to patch built-in `web_fetch`.
|
||||
|
||||
## Security requirements
|
||||
|
||||
The extension must treat Firecrawl as a **trusted operator-configured endpoint**, but still harden transport:
|
||||
|
||||
- Use SSRF-guarded fetch for the Firecrawl endpoint call, not raw `fetch()`
|
||||
- Preserve self-hosted/private-network compatibility using the same trusted-web-tools endpoint policy used elsewhere
|
||||
- Never log the API key
|
||||
- Keep endpoint/base URL resolution explicit and predictable
|
||||
- Treat Firecrawl-returned content as untrusted external content
|
||||
|
||||
This mirrors the intent behind the SSRF hardening PRs without assuming Firecrawl is a hostile multi-tenant surface.
|
||||
|
||||
## Why not a skill
|
||||
|
||||
The repo already closed a Firecrawl skill PR in favor of ClawHub distribution. That is fine for optional user-installed prompt workflows, but it does not solve:
|
||||
|
||||
- deterministic tool availability,
|
||||
- provider-grade config/credential handling,
|
||||
- self-hosted endpoint support,
|
||||
- caching,
|
||||
- stable typed outputs,
|
||||
- security review on network behavior.
|
||||
|
||||
This belongs as an extension, not a prompt-only skill.
|
||||
|
||||
## Success criteria
|
||||
|
||||
- Users can install/enable one extension and get reliable Firecrawl search/scrape without touching core defaults.
|
||||
- Self-hosted Firecrawl works with config/env fallback.
|
||||
- Extension endpoint fetches use guarded networking.
|
||||
- No new Firecrawl-specific core onboarding/default behavior.
|
||||
- Core can later adopt plugin-native `web_search` / `web_fetch` seams without redesigning the extension.
|
||||
|
||||
## Recommended implementation order
|
||||
|
||||
1. Build `firecrawl_scrape`
|
||||
2. Build `firecrawl_search`
|
||||
3. Add docs and examples
|
||||
4. If desired, generalize `web_search` provider loading so the extension can back `web_search`
|
||||
5. Only then consider a true `web_fetch` provider seam
|
||||
@ -1,27 +1,71 @@
|
||||
---
|
||||
summary: "Firecrawl fallback for web_fetch (anti-bot + cached extraction)"
|
||||
summary: "Firecrawl search, scrape, and web_fetch fallback"
|
||||
read_when:
|
||||
- You want Firecrawl-backed web extraction
|
||||
- You need a Firecrawl API key
|
||||
- You want Firecrawl as a web_search provider
|
||||
- You want anti-bot extraction for web_fetch
|
||||
title: "Firecrawl"
|
||||
---
|
||||
|
||||
# Firecrawl
|
||||
|
||||
OpenClaw can use **Firecrawl** as a fallback extractor for `web_fetch`. It is a hosted
|
||||
content extraction service that supports bot circumvention and caching, which helps
|
||||
with JS-heavy sites or pages that block plain HTTP fetches.
|
||||
OpenClaw can use **Firecrawl** in three ways:
|
||||
|
||||
- as the `web_search` provider
|
||||
- as explicit plugin tools: `firecrawl_search` and `firecrawl_scrape`
|
||||
- as a fallback extractor for `web_fetch`
|
||||
|
||||
It is a hosted extraction/search service that supports bot circumvention and caching,
|
||||
which helps with JS-heavy sites or pages that block plain HTTP fetches.
|
||||
|
||||
## Get an API key
|
||||
|
||||
1. Create a Firecrawl account and generate an API key.
|
||||
2. Store it in config or set `FIRECRAWL_API_KEY` in the gateway environment.
|
||||
|
||||
## Configure Firecrawl
|
||||
## Configure Firecrawl search
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
firecrawl: {
|
||||
enabled: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
tools: {
|
||||
web: {
|
||||
search: {
|
||||
provider: "firecrawl",
|
||||
firecrawl: {
|
||||
apiKey: "FIRECRAWL_API_KEY_HERE",
|
||||
baseUrl: "https://api.firecrawl.dev",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- Choosing Firecrawl in onboarding or `openclaw configure --section web` enables the bundled Firecrawl plugin automatically.
|
||||
- `web_search` with Firecrawl supports `query` and `count`.
|
||||
- For Firecrawl-specific controls like `sources`, `categories`, or result scraping, use `firecrawl_search`.
|
||||
|
||||
## Configure Firecrawl scrape + web_fetch fallback
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
firecrawl: {
|
||||
enabled: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
tools: {
|
||||
web: {
|
||||
fetch: {
|
||||
@ -44,6 +88,38 @@ Notes:
|
||||
- Firecrawl fallback attempts run only when an API key is available (`tools.web.fetch.firecrawl.apiKey` or `FIRECRAWL_API_KEY`).
|
||||
- `maxAgeMs` controls how old cached results can be (ms). Default is 2 days.
|
||||
|
||||
`firecrawl_scrape` reuses the same `tools.web.fetch.firecrawl.*` settings and env vars.
|
||||
|
||||
## Firecrawl plugin tools
|
||||
|
||||
### `firecrawl_search`
|
||||
|
||||
Use this when you want Firecrawl-specific search controls instead of generic `web_search`.
|
||||
|
||||
Core parameters:
|
||||
|
||||
- `query`
|
||||
- `count`
|
||||
- `sources`
|
||||
- `categories`
|
||||
- `scrapeResults`
|
||||
- `timeoutSeconds`
|
||||
|
||||
### `firecrawl_scrape`
|
||||
|
||||
Use this for JS-heavy or bot-protected pages where plain `web_fetch` is weak.
|
||||
|
||||
Core parameters:
|
||||
|
||||
- `url`
|
||||
- `extractMode`
|
||||
- `maxChars`
|
||||
- `onlyMainContent`
|
||||
- `maxAgeMs`
|
||||
- `proxy`
|
||||
- `storeInCache`
|
||||
- `timeoutSeconds`
|
||||
|
||||
## Stealth / bot circumvention
|
||||
|
||||
Firecrawl exposes a **proxy mode** parameter for bot circumvention (`basic`, `stealth`, or `auto`).
|
||||
|
||||
@ -256,7 +256,7 @@ Enable with `tools.loopDetection.enabled: true` (default is `false`).
|
||||
|
||||
### `web_search`
|
||||
|
||||
Search the web using Perplexity, Brave, Gemini, Grok, or Kimi.
|
||||
Search the web using Brave, Firecrawl, Gemini, Grok, Kimi, or Perplexity.
|
||||
|
||||
Core parameters:
|
||||
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
summary: "Web search + fetch tools (Brave, Gemini, Grok, Kimi, and Perplexity providers)"
|
||||
summary: "Web search + fetch tools (Brave, Firecrawl, Gemini, Grok, Kimi, and Perplexity providers)"
|
||||
read_when:
|
||||
- You want to enable web_search or web_fetch
|
||||
- You need provider API key setup
|
||||
@ -11,7 +11,7 @@ title: "Web Tools"
|
||||
|
||||
OpenClaw ships two lightweight web tools:
|
||||
|
||||
- `web_search` — Search the web using Brave Search API, Gemini with Google Search grounding, Grok, Kimi, or Perplexity Search API.
|
||||
- `web_search` — Search the web using Brave Search API, Firecrawl Search, Gemini with Google Search grounding, Grok, Kimi, or Perplexity Search API.
|
||||
- `web_fetch` — HTTP fetch + readable extraction (HTML → markdown/text).
|
||||
|
||||
These are **not** browser automation. For JS-heavy sites or logins, use the
|
||||
@ -24,18 +24,20 @@ These are **not** browser automation. For JS-heavy sites or logins, use the
|
||||
- `web_fetch` does a plain HTTP GET and extracts readable content
|
||||
(HTML → markdown/text). It does **not** execute JavaScript.
|
||||
- `web_fetch` is enabled by default (unless explicitly disabled).
|
||||
- The bundled Firecrawl plugin also adds `firecrawl_search` and `firecrawl_scrape` when enabled.
|
||||
|
||||
See [Brave Search setup](/brave-search) and [Perplexity Search setup](/perplexity) for provider-specific details.
|
||||
|
||||
## Choosing a search provider
|
||||
|
||||
| Provider | Result shape | Provider-specific filters | Notes | API key |
|
||||
| ------------------------- | ---------------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------- |
|
||||
| **Brave Search API** | Structured results with snippets | `country`, `language`, `ui_lang`, time | Supports Brave `llm-context` mode | `BRAVE_API_KEY` |
|
||||
| **Gemini** | AI-synthesized answers + citations | — | Uses Google Search grounding | `GEMINI_API_KEY` |
|
||||
| **Grok** | AI-synthesized answers + citations | — | Uses xAI web-grounded responses | `XAI_API_KEY` |
|
||||
| **Kimi** | AI-synthesized answers + citations | — | Uses Moonshot web search | `KIMI_API_KEY` / `MOONSHOT_API_KEY` |
|
||||
| **Perplexity Search API** | Structured results with snippets | `country`, `language`, time, `domain_filter` | Supports content extraction controls; OpenRouter uses Sonar compatibility path | `PERPLEXITY_API_KEY` / `OPENROUTER_API_KEY` |
|
||||
| Provider | Result shape | Provider-specific filters | Notes | API key |
|
||||
| ------------------------- | ---------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------------------------ | ------------------------------------------- |
|
||||
| **Brave Search API** | Structured results with snippets | `country`, `language`, `ui_lang`, time | Supports Brave `llm-context` mode | `BRAVE_API_KEY` |
|
||||
| **Firecrawl Search** | Structured results with snippets | Use `firecrawl_search` for Firecrawl-specific search options | Best for pairing search with Firecrawl scraping/extraction | `FIRECRAWL_API_KEY` |
|
||||
| **Gemini** | AI-synthesized answers + citations | — | Uses Google Search grounding | `GEMINI_API_KEY` |
|
||||
| **Grok** | AI-synthesized answers + citations | — | Uses xAI web-grounded responses | `XAI_API_KEY` |
|
||||
| **Kimi** | AI-synthesized answers + citations | — | Uses Moonshot web search | `KIMI_API_KEY` / `MOONSHOT_API_KEY` |
|
||||
| **Perplexity Search API** | Structured results with snippets | `country`, `language`, time, `domain_filter` | Supports content extraction controls; OpenRouter uses Sonar compatibility path | `PERPLEXITY_API_KEY` / `OPENROUTER_API_KEY` |
|
||||
|
||||
### Auto-detection
|
||||
|
||||
@ -46,6 +48,7 @@ The table above is alphabetical. If no `provider` is explicitly set, runtime aut
|
||||
3. **Grok** — `XAI_API_KEY` env var or `tools.web.search.grok.apiKey` config
|
||||
4. **Kimi** — `KIMI_API_KEY` / `MOONSHOT_API_KEY` env var or `tools.web.search.kimi.apiKey` config
|
||||
5. **Perplexity** — `PERPLEXITY_API_KEY`, `OPENROUTER_API_KEY`, or `tools.web.search.perplexity.apiKey` config
|
||||
6. **Firecrawl** — `FIRECRAWL_API_KEY` env var or `tools.web.search.firecrawl.apiKey` config
|
||||
|
||||
If no keys are found, it falls back to Brave (you'll get a missing-key error prompting you to configure one).
|
||||
|
||||
@ -86,6 +89,7 @@ See [Perplexity Search API Docs](https://docs.perplexity.ai/guides/search-quicks
|
||||
**Via config:** run `openclaw configure --section web`. It stores the key under the provider-specific config path:
|
||||
|
||||
- Brave: `tools.web.search.apiKey`
|
||||
- Firecrawl: `tools.web.search.firecrawl.apiKey`
|
||||
- Gemini: `tools.web.search.gemini.apiKey`
|
||||
- Grok: `tools.web.search.grok.apiKey`
|
||||
- Kimi: `tools.web.search.kimi.apiKey`
|
||||
@ -96,6 +100,7 @@ All of these fields also support SecretRef objects.
|
||||
**Via environment:** set provider env vars in the Gateway process environment:
|
||||
|
||||
- Brave: `BRAVE_API_KEY`
|
||||
- Firecrawl: `FIRECRAWL_API_KEY`
|
||||
- Gemini: `GEMINI_API_KEY`
|
||||
- Grok: `XAI_API_KEY`
|
||||
- Kimi: `KIMI_API_KEY` or `MOONSHOT_API_KEY`
|
||||
@ -121,6 +126,34 @@ For a gateway install, put these in `~/.openclaw/.env` (or your service environm
|
||||
}
|
||||
```
|
||||
|
||||
**Firecrawl Search:**
|
||||
|
||||
```json5
|
||||
{
|
||||
plugins: {
|
||||
entries: {
|
||||
firecrawl: {
|
||||
enabled: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
tools: {
|
||||
web: {
|
||||
search: {
|
||||
enabled: true,
|
||||
provider: "firecrawl",
|
||||
firecrawl: {
|
||||
apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
|
||||
baseUrl: "https://api.firecrawl.dev",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
When you choose Firecrawl in onboarding or `openclaw configure --section web`, OpenClaw enables the bundled Firecrawl plugin automatically so `web_search`, `firecrawl_search`, and `firecrawl_scrape` are all available.
|
||||
|
||||
**Brave LLM Context mode:**
|
||||
|
||||
```json5
|
||||
@ -234,6 +267,7 @@ Search the web using your configured provider.
|
||||
- `tools.web.search.enabled` must not be `false` (default: enabled)
|
||||
- API key for your chosen provider:
|
||||
- **Brave**: `BRAVE_API_KEY` or `tools.web.search.apiKey`
|
||||
- **Firecrawl**: `FIRECRAWL_API_KEY` or `tools.web.search.firecrawl.apiKey`
|
||||
- **Gemini**: `GEMINI_API_KEY` or `tools.web.search.gemini.apiKey`
|
||||
- **Grok**: `XAI_API_KEY` or `tools.web.search.grok.apiKey`
|
||||
- **Kimi**: `KIMI_API_KEY`, `MOONSHOT_API_KEY`, or `tools.web.search.kimi.apiKey`
|
||||
@ -260,7 +294,7 @@ Search the web using your configured provider.
|
||||
|
||||
### Tool parameters
|
||||
|
||||
All parameters work for Brave and for native Perplexity Search API unless noted.
|
||||
Parameters depend on the selected provider.
|
||||
|
||||
Perplexity's OpenRouter / Sonar compatibility path supports only `query` and `freshness`.
|
||||
If you set `tools.web.search.perplexity.baseUrl` / `model`, use `OPENROUTER_API_KEY`, or configure an `sk-or-...` key, Search API-only filters return explicit errors.
|
||||
@ -279,6 +313,8 @@ If you set `tools.web.search.perplexity.baseUrl` / `model`, use `OPENROUTER_API_
|
||||
| `max_tokens` | Total content budget, default 25000 (Perplexity only) |
|
||||
| `max_tokens_per_page` | Per-page token limit, default 2048 (Perplexity only) |
|
||||
|
||||
Firecrawl `web_search` supports `query` and `count`. For Firecrawl-specific controls like `sources`, `categories`, result scraping, or scrape timeout, use `firecrawl_search` from the bundled Firecrawl plugin.
|
||||
|
||||
**Examples:**
|
||||
|
||||
```javascript
|
||||
|
||||
100
extensions/firecrawl/index.test.ts
Normal file
100
extensions/firecrawl/index.test.ts
Normal file
@ -0,0 +1,100 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import plugin from "./index.js";
|
||||
import { __testing as firecrawlClientTesting } from "./src/firecrawl-client.js";
|
||||
|
||||
describe("firecrawl plugin", () => {
|
||||
it("registers a web search provider and tools", () => {
|
||||
const tools: Array<{ name: string }> = [];
|
||||
const webSearchProviders: Array<{ id: string }> = [];
|
||||
|
||||
plugin.register?.({
|
||||
config: {},
|
||||
registerTool(tool: { name: string }) {
|
||||
tools.push(tool);
|
||||
},
|
||||
registerWebSearchProvider(provider: { id: string }) {
|
||||
webSearchProviders.push(provider);
|
||||
},
|
||||
} as never);
|
||||
|
||||
expect(webSearchProviders.map((provider) => provider.id)).toEqual(["firecrawl"]);
|
||||
expect(tools.map((tool) => tool.name)).toEqual(["firecrawl_search", "firecrawl_scrape"]);
|
||||
});
|
||||
|
||||
it("parses scrape payloads into wrapped external-content results", () => {
|
||||
const result = firecrawlClientTesting.parseFirecrawlScrapePayload({
|
||||
payload: {
|
||||
success: true,
|
||||
data: {
|
||||
markdown: "# Hello\n\nWorld",
|
||||
metadata: {
|
||||
title: "Example page",
|
||||
sourceURL: "https://example.com/final",
|
||||
statusCode: 200,
|
||||
},
|
||||
},
|
||||
},
|
||||
url: "https://example.com/start",
|
||||
extractMode: "text",
|
||||
maxChars: 1000,
|
||||
});
|
||||
|
||||
expect(result.finalUrl).toBe("https://example.com/final");
|
||||
expect(result.status).toBe(200);
|
||||
expect(result.extractor).toBe("firecrawl");
|
||||
expect(typeof result.text).toBe("string");
|
||||
});
|
||||
|
||||
it("extracts search items from flexible Firecrawl payload shapes", () => {
|
||||
const items = firecrawlClientTesting.resolveSearchItems({
|
||||
success: true,
|
||||
data: [
|
||||
{
|
||||
title: "Docs",
|
||||
url: "https://docs.example.com/path",
|
||||
description: "Reference docs",
|
||||
markdown: "Body",
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(items).toEqual([
|
||||
{
|
||||
title: "Docs",
|
||||
url: "https://docs.example.com/path",
|
||||
description: "Reference docs",
|
||||
content: "Body",
|
||||
published: undefined,
|
||||
siteName: "docs.example.com",
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it("extracts search items from Firecrawl v2 data.web payloads", () => {
|
||||
const items = firecrawlClientTesting.resolveSearchItems({
|
||||
success: true,
|
||||
data: {
|
||||
web: [
|
||||
{
|
||||
title: "API Platform - OpenAI",
|
||||
url: "https://openai.com/api/",
|
||||
description: "Build on the OpenAI API platform.",
|
||||
markdown: "# API Platform",
|
||||
position: 1,
|
||||
},
|
||||
],
|
||||
},
|
||||
});
|
||||
|
||||
expect(items).toEqual([
|
||||
{
|
||||
title: "API Platform - OpenAI",
|
||||
url: "https://openai.com/api/",
|
||||
description: "Build on the OpenAI API platform.",
|
||||
content: "# API Platform",
|
||||
published: undefined,
|
||||
siteName: "openai.com",
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
20
extensions/firecrawl/index.ts
Normal file
20
extensions/firecrawl/index.ts
Normal file
@ -0,0 +1,20 @@
|
||||
import type { AnyAgentTool } from "../../src/agents/tools/common.js";
|
||||
import { emptyPluginConfigSchema } from "../../src/plugins/config-schema.js";
|
||||
import type { OpenClawPluginApi } from "../../src/plugins/types.js";
|
||||
import { createFirecrawlScrapeTool } from "./src/firecrawl-scrape-tool.js";
|
||||
import { createFirecrawlWebSearchProvider } from "./src/firecrawl-search-provider.js";
|
||||
import { createFirecrawlSearchTool } from "./src/firecrawl-search-tool.js";
|
||||
|
||||
const firecrawlPlugin = {
|
||||
id: "firecrawl",
|
||||
name: "Firecrawl Plugin",
|
||||
description: "Bundled Firecrawl search and scrape plugin",
|
||||
configSchema: emptyPluginConfigSchema(),
|
||||
register(api: OpenClawPluginApi) {
|
||||
api.registerWebSearchProvider(createFirecrawlWebSearchProvider());
|
||||
api.registerTool(createFirecrawlSearchTool(api) as AnyAgentTool);
|
||||
api.registerTool(createFirecrawlScrapeTool(api) as AnyAgentTool);
|
||||
},
|
||||
};
|
||||
|
||||
export default firecrawlPlugin;
|
||||
8
extensions/firecrawl/openclaw.plugin.json
Normal file
8
extensions/firecrawl/openclaw.plugin.json
Normal file
@ -0,0 +1,8 @@
|
||||
{
|
||||
"id": "firecrawl",
|
||||
"configSchema": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {}
|
||||
}
|
||||
}
|
||||
12
extensions/firecrawl/package.json
Normal file
12
extensions/firecrawl/package.json
Normal file
@ -0,0 +1,12 @@
|
||||
{
|
||||
"name": "@openclaw/firecrawl-plugin",
|
||||
"version": "2026.3.14",
|
||||
"private": true,
|
||||
"description": "OpenClaw Firecrawl plugin",
|
||||
"type": "module",
|
||||
"openclaw": {
|
||||
"extensions": [
|
||||
"./index.ts"
|
||||
]
|
||||
}
|
||||
}
|
||||
159
extensions/firecrawl/src/config.ts
Normal file
159
extensions/firecrawl/src/config.ts
Normal file
@ -0,0 +1,159 @@
|
||||
import type { OpenClawConfig } from "../../../src/config/config.js";
|
||||
import { normalizeResolvedSecretInputString } from "../../../src/config/types.secrets.js";
|
||||
import { normalizeSecretInput } from "../../../src/utils/normalize-secret-input.js";
|
||||
|
||||
export const DEFAULT_FIRECRAWL_BASE_URL = "https://api.firecrawl.dev";
|
||||
export const DEFAULT_FIRECRAWL_SEARCH_TIMEOUT_SECONDS = 30;
|
||||
export const DEFAULT_FIRECRAWL_SCRAPE_TIMEOUT_SECONDS = 60;
|
||||
export const DEFAULT_FIRECRAWL_MAX_AGE_MS = 172_800_000;
|
||||
|
||||
type WebSearchConfig = NonNullable<OpenClawConfig["tools"]>["web"] extends infer Web
|
||||
? Web extends { search?: infer Search }
|
||||
? Search
|
||||
: undefined
|
||||
: undefined;
|
||||
|
||||
type WebFetchConfig = NonNullable<OpenClawConfig["tools"]>["web"] extends infer Web
|
||||
? Web extends { fetch?: infer Fetch }
|
||||
? Fetch
|
||||
: undefined
|
||||
: undefined;
|
||||
|
||||
type FirecrawlSearchConfig =
|
||||
| {
|
||||
apiKey?: unknown;
|
||||
baseUrl?: string;
|
||||
}
|
||||
| undefined;
|
||||
|
||||
type FirecrawlFetchConfig =
|
||||
| {
|
||||
apiKey?: unknown;
|
||||
baseUrl?: string;
|
||||
onlyMainContent?: boolean;
|
||||
maxAgeMs?: number;
|
||||
timeoutSeconds?: number;
|
||||
}
|
||||
| undefined;
|
||||
|
||||
function resolveSearchConfig(cfg?: OpenClawConfig): WebSearchConfig {
|
||||
const search = cfg?.tools?.web?.search;
|
||||
if (!search || typeof search !== "object") {
|
||||
return undefined;
|
||||
}
|
||||
return search as WebSearchConfig;
|
||||
}
|
||||
|
||||
function resolveFetchConfig(cfg?: OpenClawConfig): WebFetchConfig {
|
||||
const fetch = cfg?.tools?.web?.fetch;
|
||||
if (!fetch || typeof fetch !== "object") {
|
||||
return undefined;
|
||||
}
|
||||
return fetch as WebFetchConfig;
|
||||
}
|
||||
|
||||
export function resolveFirecrawlSearchConfig(cfg?: OpenClawConfig): FirecrawlSearchConfig {
|
||||
const search = resolveSearchConfig(cfg);
|
||||
if (!search || typeof search !== "object") {
|
||||
return undefined;
|
||||
}
|
||||
const firecrawl = "firecrawl" in search ? search.firecrawl : undefined;
|
||||
if (!firecrawl || typeof firecrawl !== "object") {
|
||||
return undefined;
|
||||
}
|
||||
return firecrawl as FirecrawlSearchConfig;
|
||||
}
|
||||
|
||||
export function resolveFirecrawlFetchConfig(cfg?: OpenClawConfig): FirecrawlFetchConfig {
|
||||
const fetch = resolveFetchConfig(cfg);
|
||||
if (!fetch || typeof fetch !== "object") {
|
||||
return undefined;
|
||||
}
|
||||
const firecrawl = "firecrawl" in fetch ? fetch.firecrawl : undefined;
|
||||
if (!firecrawl || typeof firecrawl !== "object") {
|
||||
return undefined;
|
||||
}
|
||||
return firecrawl as FirecrawlFetchConfig;
|
||||
}
|
||||
|
||||
function normalizeConfiguredSecret(value: unknown, path: string): string | undefined {
|
||||
return normalizeSecretInput(
|
||||
normalizeResolvedSecretInputString({
|
||||
value,
|
||||
path,
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
export function resolveFirecrawlApiKey(cfg?: OpenClawConfig): string | undefined {
|
||||
const search = resolveFirecrawlSearchConfig(cfg);
|
||||
const fetch = resolveFirecrawlFetchConfig(cfg);
|
||||
return (
|
||||
normalizeConfiguredSecret(search?.apiKey, "tools.web.search.firecrawl.apiKey") ||
|
||||
normalizeConfiguredSecret(fetch?.apiKey, "tools.web.fetch.firecrawl.apiKey") ||
|
||||
normalizeSecretInput(process.env.FIRECRAWL_API_KEY) ||
|
||||
undefined
|
||||
);
|
||||
}
|
||||
|
||||
export function resolveFirecrawlBaseUrl(cfg?: OpenClawConfig): string {
|
||||
const search = resolveFirecrawlSearchConfig(cfg);
|
||||
const fetch = resolveFirecrawlFetchConfig(cfg);
|
||||
const configured =
|
||||
(typeof search?.baseUrl === "string" ? search.baseUrl.trim() : "") ||
|
||||
(typeof fetch?.baseUrl === "string" ? fetch.baseUrl.trim() : "") ||
|
||||
normalizeSecretInput(process.env.FIRECRAWL_BASE_URL) ||
|
||||
"";
|
||||
return configured || DEFAULT_FIRECRAWL_BASE_URL;
|
||||
}
|
||||
|
||||
export function resolveFirecrawlOnlyMainContent(cfg?: OpenClawConfig, override?: boolean): boolean {
|
||||
if (typeof override === "boolean") {
|
||||
return override;
|
||||
}
|
||||
const fetch = resolveFirecrawlFetchConfig(cfg);
|
||||
if (typeof fetch?.onlyMainContent === "boolean") {
|
||||
return fetch.onlyMainContent;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
export function resolveFirecrawlMaxAgeMs(cfg?: OpenClawConfig, override?: number): number {
|
||||
if (typeof override === "number" && Number.isFinite(override) && override >= 0) {
|
||||
return Math.floor(override);
|
||||
}
|
||||
const fetch = resolveFirecrawlFetchConfig(cfg);
|
||||
if (
|
||||
typeof fetch?.maxAgeMs === "number" &&
|
||||
Number.isFinite(fetch.maxAgeMs) &&
|
||||
fetch.maxAgeMs >= 0
|
||||
) {
|
||||
return Math.floor(fetch.maxAgeMs);
|
||||
}
|
||||
return DEFAULT_FIRECRAWL_MAX_AGE_MS;
|
||||
}
|
||||
|
||||
export function resolveFirecrawlScrapeTimeoutSeconds(
|
||||
cfg?: OpenClawConfig,
|
||||
override?: number,
|
||||
): number {
|
||||
if (typeof override === "number" && Number.isFinite(override) && override > 0) {
|
||||
return Math.floor(override);
|
||||
}
|
||||
const fetch = resolveFirecrawlFetchConfig(cfg);
|
||||
if (
|
||||
typeof fetch?.timeoutSeconds === "number" &&
|
||||
Number.isFinite(fetch.timeoutSeconds) &&
|
||||
fetch.timeoutSeconds > 0
|
||||
) {
|
||||
return Math.floor(fetch.timeoutSeconds);
|
||||
}
|
||||
return DEFAULT_FIRECRAWL_SCRAPE_TIMEOUT_SECONDS;
|
||||
}
|
||||
|
||||
export function resolveFirecrawlSearchTimeoutSeconds(override?: number): number {
|
||||
if (typeof override === "number" && Number.isFinite(override) && override > 0) {
|
||||
return Math.floor(override);
|
||||
}
|
||||
return DEFAULT_FIRECRAWL_SEARCH_TIMEOUT_SECONDS;
|
||||
}
|
||||
446
extensions/firecrawl/src/firecrawl-client.ts
Normal file
446
extensions/firecrawl/src/firecrawl-client.ts
Normal file
@ -0,0 +1,446 @@
|
||||
import { markdownToText, truncateText } from "../../../src/agents/tools/web-fetch-utils.js";
|
||||
import { withTrustedWebToolsEndpoint } from "../../../src/agents/tools/web-guarded-fetch.js";
|
||||
import {
|
||||
DEFAULT_CACHE_TTL_MINUTES,
|
||||
normalizeCacheKey,
|
||||
readCache,
|
||||
readResponseText,
|
||||
resolveCacheTtlMs,
|
||||
writeCache,
|
||||
} from "../../../src/agents/tools/web-shared.js";
|
||||
import type { OpenClawConfig } from "../../../src/config/config.js";
|
||||
import { wrapExternalContent, wrapWebContent } from "../../../src/security/external-content.js";
|
||||
import {
|
||||
resolveFirecrawlApiKey,
|
||||
resolveFirecrawlBaseUrl,
|
||||
resolveFirecrawlMaxAgeMs,
|
||||
resolveFirecrawlOnlyMainContent,
|
||||
resolveFirecrawlScrapeTimeoutSeconds,
|
||||
resolveFirecrawlSearchTimeoutSeconds,
|
||||
} from "./config.js";
|
||||
|
||||
const SEARCH_CACHE = new Map<
|
||||
string,
|
||||
{ value: Record<string, unknown>; expiresAt: number; insertedAt: number }
|
||||
>();
|
||||
const SCRAPE_CACHE = new Map<
|
||||
string,
|
||||
{ value: Record<string, unknown>; expiresAt: number; insertedAt: number }
|
||||
>();
|
||||
const DEFAULT_SEARCH_COUNT = 5;
|
||||
const DEFAULT_SCRAPE_MAX_CHARS = 50_000;
|
||||
const DEFAULT_ERROR_MAX_BYTES = 64_000;
|
||||
|
||||
type FirecrawlSearchItem = {
|
||||
title: string;
|
||||
url: string;
|
||||
description?: string;
|
||||
content?: string;
|
||||
published?: string;
|
||||
siteName?: string;
|
||||
};
|
||||
|
||||
export type FirecrawlSearchParams = {
|
||||
cfg?: OpenClawConfig;
|
||||
query: string;
|
||||
count?: number;
|
||||
timeoutSeconds?: number;
|
||||
sources?: string[];
|
||||
categories?: string[];
|
||||
scrapeResults?: boolean;
|
||||
};
|
||||
|
||||
export type FirecrawlScrapeParams = {
|
||||
cfg?: OpenClawConfig;
|
||||
url: string;
|
||||
extractMode: "markdown" | "text";
|
||||
maxChars?: number;
|
||||
onlyMainContent?: boolean;
|
||||
maxAgeMs?: number;
|
||||
proxy?: "auto" | "basic" | "stealth";
|
||||
storeInCache?: boolean;
|
||||
timeoutSeconds?: number;
|
||||
};
|
||||
|
||||
function resolveEndpoint(baseUrl: string, pathname: "/v2/search" | "/v2/scrape"): string {
|
||||
const trimmed = baseUrl.trim();
|
||||
if (!trimmed) {
|
||||
return new URL(pathname, "https://api.firecrawl.dev").toString();
|
||||
}
|
||||
try {
|
||||
const url = new URL(trimmed);
|
||||
if (url.pathname && url.pathname !== "/") {
|
||||
return url.toString();
|
||||
}
|
||||
url.pathname = pathname;
|
||||
return url.toString();
|
||||
} catch {
|
||||
return new URL(pathname, "https://api.firecrawl.dev").toString();
|
||||
}
|
||||
}
|
||||
|
||||
function resolveSiteName(urlRaw: string): string | undefined {
|
||||
try {
|
||||
const host = new URL(urlRaw).hostname.replace(/^www\./, "");
|
||||
return host || undefined;
|
||||
} catch {
|
||||
return undefined;
|
||||
}
|
||||
}
|
||||
|
||||
async function postFirecrawlJson(params: {
|
||||
baseUrl: string;
|
||||
pathname: "/v2/search" | "/v2/scrape";
|
||||
apiKey: string;
|
||||
body: Record<string, unknown>;
|
||||
timeoutSeconds: number;
|
||||
errorLabel: string;
|
||||
}): Promise<Record<string, unknown>> {
|
||||
const endpoint = resolveEndpoint(params.baseUrl, params.pathname);
|
||||
return await withTrustedWebToolsEndpoint(
|
||||
{
|
||||
url: endpoint,
|
||||
timeoutSeconds: params.timeoutSeconds,
|
||||
init: {
|
||||
method: "POST",
|
||||
headers: {
|
||||
Accept: "application/json",
|
||||
Authorization: `Bearer ${params.apiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify(params.body),
|
||||
},
|
||||
},
|
||||
async ({ response }) => {
|
||||
if (!response.ok) {
|
||||
const detail = await readResponseText(response, { maxBytes: DEFAULT_ERROR_MAX_BYTES });
|
||||
throw new Error(
|
||||
`${params.errorLabel} API error (${response.status}): ${detail.text || response.statusText}`,
|
||||
);
|
||||
}
|
||||
const payload = (await response.json()) as Record<string, unknown>;
|
||||
if (payload.success === false) {
|
||||
const error =
|
||||
typeof payload.error === "string"
|
||||
? payload.error
|
||||
: typeof payload.message === "string"
|
||||
? payload.message
|
||||
: "unknown error";
|
||||
throw new Error(`${params.errorLabel} API error: ${error}`);
|
||||
}
|
||||
return payload;
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
function resolveSearchItems(payload: Record<string, unknown>): FirecrawlSearchItem[] {
|
||||
const candidates = [
|
||||
payload.data,
|
||||
payload.results,
|
||||
(payload.data as { results?: unknown } | undefined)?.results,
|
||||
(payload.data as { data?: unknown } | undefined)?.data,
|
||||
(payload.data as { web?: unknown } | undefined)?.web,
|
||||
(payload.web as { results?: unknown } | undefined)?.results,
|
||||
];
|
||||
const rawItems = candidates.find((candidate) => Array.isArray(candidate));
|
||||
if (!Array.isArray(rawItems)) {
|
||||
return [];
|
||||
}
|
||||
const items: FirecrawlSearchItem[] = [];
|
||||
for (const entry of rawItems) {
|
||||
if (!entry || typeof entry !== "object") {
|
||||
continue;
|
||||
}
|
||||
const record = entry as Record<string, unknown>;
|
||||
const metadata =
|
||||
record.metadata && typeof record.metadata === "object"
|
||||
? (record.metadata as Record<string, unknown>)
|
||||
: undefined;
|
||||
const url =
|
||||
(typeof record.url === "string" && record.url) ||
|
||||
(typeof record.sourceURL === "string" && record.sourceURL) ||
|
||||
(typeof record.sourceUrl === "string" && record.sourceUrl) ||
|
||||
(typeof metadata?.sourceURL === "string" && metadata.sourceURL) ||
|
||||
"";
|
||||
if (!url) {
|
||||
continue;
|
||||
}
|
||||
const title =
|
||||
(typeof record.title === "string" && record.title) ||
|
||||
(typeof metadata?.title === "string" && metadata.title) ||
|
||||
"";
|
||||
const description =
|
||||
(typeof record.description === "string" && record.description) ||
|
||||
(typeof record.snippet === "string" && record.snippet) ||
|
||||
(typeof record.summary === "string" && record.summary) ||
|
||||
undefined;
|
||||
const content =
|
||||
(typeof record.markdown === "string" && record.markdown) ||
|
||||
(typeof record.content === "string" && record.content) ||
|
||||
(typeof record.text === "string" && record.text) ||
|
||||
undefined;
|
||||
const published =
|
||||
(typeof record.publishedDate === "string" && record.publishedDate) ||
|
||||
(typeof record.published === "string" && record.published) ||
|
||||
(typeof metadata?.publishedTime === "string" && metadata.publishedTime) ||
|
||||
(typeof metadata?.publishedDate === "string" && metadata.publishedDate) ||
|
||||
undefined;
|
||||
items.push({
|
||||
title,
|
||||
url,
|
||||
description,
|
||||
content,
|
||||
published,
|
||||
siteName: resolveSiteName(url),
|
||||
});
|
||||
}
|
||||
return items;
|
||||
}
|
||||
|
||||
function buildSearchPayload(params: {
|
||||
query: string;
|
||||
provider: "firecrawl";
|
||||
items: FirecrawlSearchItem[];
|
||||
tookMs: number;
|
||||
scrapeResults: boolean;
|
||||
}): Record<string, unknown> {
|
||||
return {
|
||||
query: params.query,
|
||||
provider: params.provider,
|
||||
count: params.items.length,
|
||||
tookMs: params.tookMs,
|
||||
externalContent: {
|
||||
untrusted: true,
|
||||
source: "web_search",
|
||||
provider: params.provider,
|
||||
wrapped: true,
|
||||
},
|
||||
results: params.items.map((entry) => ({
|
||||
title: entry.title ? wrapWebContent(entry.title, "web_search") : "",
|
||||
url: entry.url,
|
||||
description: entry.description ? wrapWebContent(entry.description, "web_search") : "",
|
||||
...(entry.published ? { published: entry.published } : {}),
|
||||
...(entry.siteName ? { siteName: entry.siteName } : {}),
|
||||
...(params.scrapeResults && entry.content
|
||||
? { content: wrapWebContent(entry.content, "web_search") }
|
||||
: {}),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
export async function runFirecrawlSearch(
|
||||
params: FirecrawlSearchParams,
|
||||
): Promise<Record<string, unknown>> {
|
||||
const apiKey = resolveFirecrawlApiKey(params.cfg);
|
||||
if (!apiKey) {
|
||||
throw new Error(
|
||||
"web_search (firecrawl) needs a Firecrawl API key. Set FIRECRAWL_API_KEY in the Gateway environment, or configure tools.web.search.firecrawl.apiKey.",
|
||||
);
|
||||
}
|
||||
const count =
|
||||
typeof params.count === "number" && Number.isFinite(params.count)
|
||||
? Math.max(1, Math.min(10, Math.floor(params.count)))
|
||||
: DEFAULT_SEARCH_COUNT;
|
||||
const timeoutSeconds = resolveFirecrawlSearchTimeoutSeconds(params.timeoutSeconds);
|
||||
const scrapeResults = params.scrapeResults === true;
|
||||
const sources = Array.isArray(params.sources) ? params.sources.filter(Boolean) : [];
|
||||
const categories = Array.isArray(params.categories) ? params.categories.filter(Boolean) : [];
|
||||
const baseUrl = resolveFirecrawlBaseUrl(params.cfg);
|
||||
const cacheKey = normalizeCacheKey(
|
||||
JSON.stringify({
|
||||
type: "firecrawl-search",
|
||||
q: params.query,
|
||||
count,
|
||||
baseUrl,
|
||||
sources,
|
||||
categories,
|
||||
scrapeResults,
|
||||
}),
|
||||
);
|
||||
const cached = readCache(SEARCH_CACHE, cacheKey);
|
||||
if (cached) {
|
||||
return { ...cached.value, cached: true };
|
||||
}
|
||||
|
||||
const body: Record<string, unknown> = {
|
||||
query: params.query,
|
||||
limit: count,
|
||||
};
|
||||
if (sources.length > 0) {
|
||||
body.sources = sources;
|
||||
}
|
||||
if (categories.length > 0) {
|
||||
body.categories = categories;
|
||||
}
|
||||
if (scrapeResults) {
|
||||
body.scrapeOptions = {
|
||||
formats: ["markdown"],
|
||||
};
|
||||
}
|
||||
|
||||
const start = Date.now();
|
||||
const payload = await postFirecrawlJson({
|
||||
baseUrl,
|
||||
pathname: "/v2/search",
|
||||
apiKey,
|
||||
body,
|
||||
timeoutSeconds,
|
||||
errorLabel: "Firecrawl Search",
|
||||
});
|
||||
const result = buildSearchPayload({
|
||||
query: params.query,
|
||||
provider: "firecrawl",
|
||||
items: resolveSearchItems(payload),
|
||||
tookMs: Date.now() - start,
|
||||
scrapeResults,
|
||||
});
|
||||
writeCache(
|
||||
SEARCH_CACHE,
|
||||
cacheKey,
|
||||
result,
|
||||
resolveCacheTtlMs(undefined, DEFAULT_CACHE_TTL_MINUTES),
|
||||
);
|
||||
return result;
|
||||
}
|
||||
|
||||
function resolveScrapeData(payload: Record<string, unknown>): Record<string, unknown> {
|
||||
const data = payload.data;
|
||||
if (data && typeof data === "object") {
|
||||
return data as Record<string, unknown>;
|
||||
}
|
||||
return {};
|
||||
}
|
||||
|
||||
export function parseFirecrawlScrapePayload(params: {
|
||||
payload: Record<string, unknown>;
|
||||
url: string;
|
||||
extractMode: "markdown" | "text";
|
||||
maxChars: number;
|
||||
}): Record<string, unknown> {
|
||||
const data = resolveScrapeData(params.payload);
|
||||
const metadata =
|
||||
data.metadata && typeof data.metadata === "object"
|
||||
? (data.metadata as Record<string, unknown>)
|
||||
: undefined;
|
||||
const markdown =
|
||||
(typeof data.markdown === "string" && data.markdown) ||
|
||||
(typeof data.content === "string" && data.content) ||
|
||||
"";
|
||||
if (!markdown) {
|
||||
throw new Error("Firecrawl scrape returned no content.");
|
||||
}
|
||||
const rawText = params.extractMode === "text" ? markdownToText(markdown) : markdown;
|
||||
const truncated = truncateText(rawText, params.maxChars);
|
||||
return {
|
||||
url: params.url,
|
||||
finalUrl:
|
||||
(typeof metadata?.sourceURL === "string" && metadata.sourceURL) ||
|
||||
(typeof data.url === "string" && data.url) ||
|
||||
params.url,
|
||||
status:
|
||||
(typeof metadata?.statusCode === "number" && metadata.statusCode) ||
|
||||
(typeof data.statusCode === "number" && data.statusCode) ||
|
||||
undefined,
|
||||
title:
|
||||
typeof metadata?.title === "string" && metadata.title
|
||||
? wrapExternalContent(metadata.title, { source: "web_fetch", includeWarning: false })
|
||||
: undefined,
|
||||
extractor: "firecrawl",
|
||||
extractMode: params.extractMode,
|
||||
externalContent: {
|
||||
untrusted: true,
|
||||
source: "web_fetch",
|
||||
wrapped: true,
|
||||
},
|
||||
truncated: truncated.truncated,
|
||||
rawLength: rawText.length,
|
||||
wrappedLength: wrapExternalContent(truncated.text, {
|
||||
source: "web_fetch",
|
||||
includeWarning: false,
|
||||
}).length,
|
||||
text: wrapExternalContent(truncated.text, {
|
||||
source: "web_fetch",
|
||||
includeWarning: false,
|
||||
}),
|
||||
warning:
|
||||
typeof params.payload.warning === "string" && params.payload.warning
|
||||
? wrapExternalContent(params.payload.warning, {
|
||||
source: "web_fetch",
|
||||
includeWarning: false,
|
||||
})
|
||||
: undefined,
|
||||
};
|
||||
}
|
||||
|
||||
export async function runFirecrawlScrape(
|
||||
params: FirecrawlScrapeParams,
|
||||
): Promise<Record<string, unknown>> {
|
||||
const apiKey = resolveFirecrawlApiKey(params.cfg);
|
||||
if (!apiKey) {
|
||||
throw new Error(
|
||||
"firecrawl_scrape needs a Firecrawl API key. Set FIRECRAWL_API_KEY in the Gateway environment, or configure tools.web.fetch.firecrawl.apiKey.",
|
||||
);
|
||||
}
|
||||
const baseUrl = resolveFirecrawlBaseUrl(params.cfg);
|
||||
const timeoutSeconds = resolveFirecrawlScrapeTimeoutSeconds(params.cfg, params.timeoutSeconds);
|
||||
const onlyMainContent = resolveFirecrawlOnlyMainContent(params.cfg, params.onlyMainContent);
|
||||
const maxAgeMs = resolveFirecrawlMaxAgeMs(params.cfg, params.maxAgeMs);
|
||||
const proxy = params.proxy ?? "auto";
|
||||
const storeInCache = params.storeInCache ?? true;
|
||||
const maxChars =
|
||||
typeof params.maxChars === "number" && Number.isFinite(params.maxChars) && params.maxChars > 0
|
||||
? Math.floor(params.maxChars)
|
||||
: DEFAULT_SCRAPE_MAX_CHARS;
|
||||
const cacheKey = normalizeCacheKey(
|
||||
JSON.stringify({
|
||||
type: "firecrawl-scrape",
|
||||
url: params.url,
|
||||
extractMode: params.extractMode,
|
||||
baseUrl,
|
||||
onlyMainContent,
|
||||
maxAgeMs,
|
||||
proxy,
|
||||
storeInCache,
|
||||
maxChars,
|
||||
}),
|
||||
);
|
||||
const cached = readCache(SCRAPE_CACHE, cacheKey);
|
||||
if (cached) {
|
||||
return { ...cached.value, cached: true };
|
||||
}
|
||||
|
||||
const payload = await postFirecrawlJson({
|
||||
baseUrl,
|
||||
pathname: "/v2/scrape",
|
||||
apiKey,
|
||||
timeoutSeconds,
|
||||
errorLabel: "Firecrawl",
|
||||
body: {
|
||||
url: params.url,
|
||||
formats: ["markdown"],
|
||||
onlyMainContent,
|
||||
timeout: timeoutSeconds * 1000,
|
||||
maxAge: maxAgeMs,
|
||||
proxy,
|
||||
storeInCache,
|
||||
},
|
||||
});
|
||||
const result = parseFirecrawlScrapePayload({
|
||||
payload,
|
||||
url: params.url,
|
||||
extractMode: params.extractMode,
|
||||
maxChars,
|
||||
});
|
||||
writeCache(
|
||||
SCRAPE_CACHE,
|
||||
cacheKey,
|
||||
result,
|
||||
resolveCacheTtlMs(undefined, DEFAULT_CACHE_TTL_MINUTES),
|
||||
);
|
||||
return result;
|
||||
}
|
||||
|
||||
export const __testing = {
|
||||
parseFirecrawlScrapePayload,
|
||||
resolveSearchItems,
|
||||
};
|
||||
89
extensions/firecrawl/src/firecrawl-scrape-tool.ts
Normal file
89
extensions/firecrawl/src/firecrawl-scrape-tool.ts
Normal file
@ -0,0 +1,89 @@
|
||||
import { Type } from "@sinclair/typebox";
|
||||
import { optionalStringEnum } from "../../../src/agents/schema/typebox.js";
|
||||
import { jsonResult, readNumberParam, readStringParam } from "../../../src/agents/tools/common.js";
|
||||
import type { OpenClawPluginApi } from "../../../src/plugins/types.js";
|
||||
import { runFirecrawlScrape } from "./firecrawl-client.js";
|
||||
|
||||
const FirecrawlScrapeToolSchema = Type.Object(
|
||||
{
|
||||
url: Type.String({ description: "HTTP or HTTPS URL to scrape via Firecrawl." }),
|
||||
extractMode: optionalStringEnum(["markdown", "text"] as const, {
|
||||
description: 'Extraction mode ("markdown" or "text"). Default: markdown.',
|
||||
}),
|
||||
maxChars: Type.Optional(
|
||||
Type.Number({
|
||||
description: "Maximum characters to return.",
|
||||
minimum: 100,
|
||||
}),
|
||||
),
|
||||
onlyMainContent: Type.Optional(
|
||||
Type.Boolean({
|
||||
description: "Keep only main content when Firecrawl supports it.",
|
||||
}),
|
||||
),
|
||||
maxAgeMs: Type.Optional(
|
||||
Type.Number({
|
||||
description: "Maximum Firecrawl cache age in milliseconds.",
|
||||
minimum: 0,
|
||||
}),
|
||||
),
|
||||
proxy: optionalStringEnum(["auto", "basic", "stealth"] as const, {
|
||||
description: 'Firecrawl proxy mode ("auto", "basic", or "stealth").',
|
||||
}),
|
||||
storeInCache: Type.Optional(
|
||||
Type.Boolean({
|
||||
description: "Whether Firecrawl should store the scrape in its cache.",
|
||||
}),
|
||||
),
|
||||
timeoutSeconds: Type.Optional(
|
||||
Type.Number({
|
||||
description: "Timeout in seconds for the Firecrawl scrape request.",
|
||||
minimum: 1,
|
||||
}),
|
||||
),
|
||||
},
|
||||
{ additionalProperties: false },
|
||||
);
|
||||
|
||||
export function createFirecrawlScrapeTool(api: OpenClawPluginApi) {
|
||||
return {
|
||||
name: "firecrawl_scrape",
|
||||
label: "Firecrawl Scrape",
|
||||
description:
|
||||
"Scrape a page using Firecrawl v2/scrape. Useful for JS-heavy or bot-protected pages where plain web_fetch is weak.",
|
||||
parameters: FirecrawlScrapeToolSchema,
|
||||
execute: async (_toolCallId: string, rawParams: Record<string, unknown>) => {
|
||||
const url = readStringParam(rawParams, "url", { required: true });
|
||||
const extractMode =
|
||||
readStringParam(rawParams, "extractMode") === "text" ? "text" : "markdown";
|
||||
const maxChars = readNumberParam(rawParams, "maxChars", { integer: true });
|
||||
const maxAgeMs = readNumberParam(rawParams, "maxAgeMs", { integer: true });
|
||||
const timeoutSeconds = readNumberParam(rawParams, "timeoutSeconds", {
|
||||
integer: true,
|
||||
});
|
||||
const proxyRaw = readStringParam(rawParams, "proxy");
|
||||
const proxy =
|
||||
proxyRaw === "basic" || proxyRaw === "stealth" || proxyRaw === "auto"
|
||||
? proxyRaw
|
||||
: undefined;
|
||||
const onlyMainContent =
|
||||
typeof rawParams.onlyMainContent === "boolean" ? rawParams.onlyMainContent : undefined;
|
||||
const storeInCache =
|
||||
typeof rawParams.storeInCache === "boolean" ? rawParams.storeInCache : undefined;
|
||||
|
||||
return jsonResult(
|
||||
await runFirecrawlScrape({
|
||||
cfg: api.config,
|
||||
url,
|
||||
extractMode,
|
||||
maxChars,
|
||||
onlyMainContent,
|
||||
maxAgeMs,
|
||||
proxy,
|
||||
storeInCache,
|
||||
timeoutSeconds,
|
||||
}),
|
||||
);
|
||||
},
|
||||
};
|
||||
}
|
||||
63
extensions/firecrawl/src/firecrawl-search-provider.ts
Normal file
63
extensions/firecrawl/src/firecrawl-search-provider.ts
Normal file
@ -0,0 +1,63 @@
|
||||
import { Type } from "@sinclair/typebox";
|
||||
import type { WebSearchProviderPlugin } from "../../../src/plugins/types.js";
|
||||
import { runFirecrawlSearch } from "./firecrawl-client.js";
|
||||
|
||||
const GenericFirecrawlSearchSchema = Type.Object(
|
||||
{
|
||||
query: Type.String({ description: "Search query string." }),
|
||||
count: Type.Optional(
|
||||
Type.Number({
|
||||
description: "Number of results to return (1-10).",
|
||||
minimum: 1,
|
||||
maximum: 10,
|
||||
}),
|
||||
),
|
||||
},
|
||||
{ additionalProperties: false },
|
||||
);
|
||||
|
||||
function getScopedCredentialValue(searchConfig?: Record<string, unknown>): unknown {
|
||||
const scoped = searchConfig?.firecrawl;
|
||||
if (!scoped || typeof scoped !== "object" || Array.isArray(scoped)) {
|
||||
return undefined;
|
||||
}
|
||||
return (scoped as Record<string, unknown>).apiKey;
|
||||
}
|
||||
|
||||
function setScopedCredentialValue(
|
||||
searchConfigTarget: Record<string, unknown>,
|
||||
value: unknown,
|
||||
): void {
|
||||
const scoped = searchConfigTarget.firecrawl;
|
||||
if (!scoped || typeof scoped !== "object" || Array.isArray(scoped)) {
|
||||
searchConfigTarget.firecrawl = { apiKey: value };
|
||||
return;
|
||||
}
|
||||
(scoped as Record<string, unknown>).apiKey = value;
|
||||
}
|
||||
|
||||
export function createFirecrawlWebSearchProvider(): WebSearchProviderPlugin {
|
||||
return {
|
||||
id: "firecrawl",
|
||||
label: "Firecrawl Search",
|
||||
hint: "Structured results with optional result scraping",
|
||||
envVars: ["FIRECRAWL_API_KEY"],
|
||||
placeholder: "fc-...",
|
||||
signupUrl: "https://www.firecrawl.dev/",
|
||||
docsUrl: "https://docs.openclaw.ai/tools/firecrawl",
|
||||
autoDetectOrder: 60,
|
||||
getCredentialValue: getScopedCredentialValue,
|
||||
setCredentialValue: setScopedCredentialValue,
|
||||
createTool: (ctx) => ({
|
||||
description:
|
||||
"Search the web using Firecrawl. Returns structured results with snippets from Firecrawl Search. Use firecrawl_search for Firecrawl-specific knobs like sources or categories.",
|
||||
parameters: GenericFirecrawlSearchSchema,
|
||||
execute: async (args) =>
|
||||
await runFirecrawlSearch({
|
||||
cfg: ctx.config,
|
||||
query: typeof args.query === "string" ? args.query : "",
|
||||
count: typeof args.count === "number" ? args.count : undefined,
|
||||
}),
|
||||
}),
|
||||
};
|
||||
}
|
||||
76
extensions/firecrawl/src/firecrawl-search-tool.ts
Normal file
76
extensions/firecrawl/src/firecrawl-search-tool.ts
Normal file
@ -0,0 +1,76 @@
|
||||
import { Type } from "@sinclair/typebox";
|
||||
import {
|
||||
jsonResult,
|
||||
readNumberParam,
|
||||
readStringArrayParam,
|
||||
readStringParam,
|
||||
} from "../../../src/agents/tools/common.js";
|
||||
import type { OpenClawPluginApi } from "../../../src/plugins/types.js";
|
||||
import { runFirecrawlSearch } from "./firecrawl-client.js";
|
||||
|
||||
const FirecrawlSearchToolSchema = Type.Object(
|
||||
{
|
||||
query: Type.String({ description: "Search query string." }),
|
||||
count: Type.Optional(
|
||||
Type.Number({
|
||||
description: "Number of results to return (1-10).",
|
||||
minimum: 1,
|
||||
maximum: 10,
|
||||
}),
|
||||
),
|
||||
sources: Type.Optional(
|
||||
Type.Array(Type.String(), {
|
||||
description: 'Optional sources list, for example ["web"], ["news"], or ["images"].',
|
||||
}),
|
||||
),
|
||||
categories: Type.Optional(
|
||||
Type.Array(Type.String(), {
|
||||
description: 'Optional Firecrawl categories, for example ["github"] or ["research"].',
|
||||
}),
|
||||
),
|
||||
scrapeResults: Type.Optional(
|
||||
Type.Boolean({
|
||||
description: "Include scraped result content when Firecrawl returns it.",
|
||||
}),
|
||||
),
|
||||
timeoutSeconds: Type.Optional(
|
||||
Type.Number({
|
||||
description: "Timeout in seconds for the Firecrawl Search request.",
|
||||
minimum: 1,
|
||||
}),
|
||||
),
|
||||
},
|
||||
{ additionalProperties: false },
|
||||
);
|
||||
|
||||
export function createFirecrawlSearchTool(api: OpenClawPluginApi) {
|
||||
return {
|
||||
name: "firecrawl_search",
|
||||
label: "Firecrawl Search",
|
||||
description:
|
||||
"Search the web using Firecrawl v2/search. Can optionally include scraped content from result pages.",
|
||||
parameters: FirecrawlSearchToolSchema,
|
||||
execute: async (_toolCallId: string, rawParams: Record<string, unknown>) => {
|
||||
const query = readStringParam(rawParams, "query", { required: true });
|
||||
const count = readNumberParam(rawParams, "count", { integer: true });
|
||||
const timeoutSeconds = readNumberParam(rawParams, "timeoutSeconds", {
|
||||
integer: true,
|
||||
});
|
||||
const sources = readStringArrayParam(rawParams, "sources");
|
||||
const categories = readStringArrayParam(rawParams, "categories");
|
||||
const scrapeResults = rawParams.scrapeResults === true;
|
||||
|
||||
return jsonResult(
|
||||
await runFirecrawlSearch({
|
||||
cfg: api.config,
|
||||
query,
|
||||
count,
|
||||
timeoutSeconds,
|
||||
sources,
|
||||
categories,
|
||||
scrapeResults,
|
||||
}),
|
||||
);
|
||||
},
|
||||
};
|
||||
}
|
||||
@ -206,27 +206,33 @@ function exceedsEstimatedHtmlNestingDepth(html: string, maxDepth: number): boole
|
||||
return false;
|
||||
}
|
||||
|
||||
export async function extractBasicHtmlContent(params: {
|
||||
html: string;
|
||||
extractMode: ExtractMode;
|
||||
}): Promise<{ text: string; title?: string } | null> {
|
||||
const cleanHtml = await sanitizeHtml(params.html);
|
||||
const rendered = htmlToMarkdown(cleanHtml);
|
||||
if (params.extractMode === "text") {
|
||||
const text =
|
||||
stripInvisibleUnicode(markdownToText(rendered.text)) ||
|
||||
stripInvisibleUnicode(normalizeWhitespace(stripTags(cleanHtml)));
|
||||
return text ? { text, title: rendered.title } : null;
|
||||
}
|
||||
const text = stripInvisibleUnicode(rendered.text);
|
||||
return text ? { text, title: rendered.title } : null;
|
||||
}
|
||||
|
||||
export async function extractReadableContent(params: {
|
||||
html: string;
|
||||
url: string;
|
||||
extractMode: ExtractMode;
|
||||
}): Promise<{ text: string; title?: string } | null> {
|
||||
const cleanHtml = await sanitizeHtml(params.html);
|
||||
const fallback = (): { text: string; title?: string } => {
|
||||
const rendered = htmlToMarkdown(cleanHtml);
|
||||
if (params.extractMode === "text") {
|
||||
const text =
|
||||
stripInvisibleUnicode(markdownToText(rendered.text)) ||
|
||||
stripInvisibleUnicode(normalizeWhitespace(stripTags(cleanHtml)));
|
||||
return { text, title: rendered.title };
|
||||
}
|
||||
return { text: stripInvisibleUnicode(rendered.text), title: rendered.title };
|
||||
};
|
||||
if (
|
||||
cleanHtml.length > READABILITY_MAX_HTML_CHARS ||
|
||||
exceedsEstimatedHtmlNestingDepth(cleanHtml, READABILITY_MAX_ESTIMATED_NESTING_DEPTH)
|
||||
) {
|
||||
return fallback();
|
||||
return null;
|
||||
}
|
||||
try {
|
||||
const { Readability, parseHTML } = await loadReadabilityDeps();
|
||||
@ -239,16 +245,17 @@ export async function extractReadableContent(params: {
|
||||
const reader = new Readability(document, { charThreshold: 0 });
|
||||
const parsed = reader.parse();
|
||||
if (!parsed?.content) {
|
||||
return fallback();
|
||||
return null;
|
||||
}
|
||||
const title = parsed.title || undefined;
|
||||
if (params.extractMode === "text") {
|
||||
const text = stripInvisibleUnicode(normalizeWhitespace(parsed.textContent ?? ""));
|
||||
return text ? { text, title } : fallback();
|
||||
return text ? { text, title } : null;
|
||||
}
|
||||
const rendered = htmlToMarkdown(parsed.content);
|
||||
return { text: stripInvisibleUnicode(rendered.text), title: title ?? rendered.title };
|
||||
const text = stripInvisibleUnicode(rendered.text);
|
||||
return text ? { text, title: title ?? rendered.title } : null;
|
||||
} catch {
|
||||
return fallback();
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
@ -10,13 +10,14 @@ import { stringEnum } from "../schema/typebox.js";
|
||||
import type { AnyAgentTool } from "./common.js";
|
||||
import { jsonResult, readNumberParam, readStringParam } from "./common.js";
|
||||
import {
|
||||
extractBasicHtmlContent,
|
||||
extractReadableContent,
|
||||
htmlToMarkdown,
|
||||
markdownToText,
|
||||
truncateText,
|
||||
type ExtractMode,
|
||||
} from "./web-fetch-utils.js";
|
||||
import { fetchWithWebToolsNetworkGuard } from "./web-guarded-fetch.js";
|
||||
import { fetchWithWebToolsNetworkGuard, withTrustedWebToolsEndpoint } from "./web-guarded-fetch.js";
|
||||
import {
|
||||
CacheEntry,
|
||||
DEFAULT_CACHE_TTL_MINUTES,
|
||||
@ -26,7 +27,6 @@ import {
|
||||
readResponseText,
|
||||
resolveCacheTtlMs,
|
||||
resolveTimeoutSeconds,
|
||||
withTimeout,
|
||||
writeCache,
|
||||
} from "./web-shared.js";
|
||||
|
||||
@ -161,11 +161,12 @@ function resolveFirecrawlEnabled(params: {
|
||||
}
|
||||
|
||||
function resolveFirecrawlBaseUrl(firecrawl?: FirecrawlFetchConfig): string {
|
||||
const raw =
|
||||
const fromConfig =
|
||||
firecrawl && "baseUrl" in firecrawl && typeof firecrawl.baseUrl === "string"
|
||||
? firecrawl.baseUrl.trim()
|
||||
: "";
|
||||
return raw || DEFAULT_FIRECRAWL_BASE_URL;
|
||||
const fromEnv = normalizeSecretInput(process.env.FIRECRAWL_BASE_URL);
|
||||
return fromConfig || fromEnv || DEFAULT_FIRECRAWL_BASE_URL;
|
||||
}
|
||||
|
||||
function resolveFirecrawlOnlyMainContent(firecrawl?: FirecrawlFetchConfig): boolean {
|
||||
@ -381,54 +382,59 @@ export async function fetchFirecrawlContent(params: {
|
||||
proxy: params.proxy,
|
||||
storeInCache: params.storeInCache,
|
||||
};
|
||||
|
||||
const res = await fetch(endpoint, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
Authorization: `Bearer ${params.apiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
return await withTrustedWebToolsEndpoint(
|
||||
{
|
||||
url: endpoint,
|
||||
timeoutSeconds: params.timeoutSeconds,
|
||||
init: {
|
||||
method: "POST",
|
||||
headers: {
|
||||
Authorization: `Bearer ${params.apiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify(body),
|
||||
},
|
||||
},
|
||||
body: JSON.stringify(body),
|
||||
signal: withTimeout(undefined, params.timeoutSeconds * 1000),
|
||||
});
|
||||
|
||||
const payload = (await res.json()) as {
|
||||
success?: boolean;
|
||||
data?: {
|
||||
markdown?: string;
|
||||
content?: string;
|
||||
metadata?: {
|
||||
title?: string;
|
||||
sourceURL?: string;
|
||||
statusCode?: number;
|
||||
async ({ response }) => {
|
||||
const payload = (await response.json()) as {
|
||||
success?: boolean;
|
||||
data?: {
|
||||
markdown?: string;
|
||||
content?: string;
|
||||
metadata?: {
|
||||
title?: string;
|
||||
sourceURL?: string;
|
||||
statusCode?: number;
|
||||
};
|
||||
};
|
||||
warning?: string;
|
||||
error?: string;
|
||||
};
|
||||
};
|
||||
warning?: string;
|
||||
error?: string;
|
||||
};
|
||||
|
||||
if (!res.ok || payload?.success === false) {
|
||||
const detail = payload?.error ?? "";
|
||||
throw new Error(
|
||||
`Firecrawl fetch failed (${res.status}): ${wrapWebContent(detail || res.statusText, "web_fetch")}`.trim(),
|
||||
);
|
||||
}
|
||||
if (!response.ok || payload?.success === false) {
|
||||
const detail = payload?.error ?? "";
|
||||
throw new Error(
|
||||
`Firecrawl fetch failed (${response.status}): ${wrapWebContent(detail || response.statusText, "web_fetch")}`.trim(),
|
||||
);
|
||||
}
|
||||
|
||||
const data = payload?.data ?? {};
|
||||
const rawText =
|
||||
typeof data.markdown === "string"
|
||||
? data.markdown
|
||||
: typeof data.content === "string"
|
||||
? data.content
|
||||
: "";
|
||||
const text = params.extractMode === "text" ? markdownToText(rawText) : rawText;
|
||||
return {
|
||||
text,
|
||||
title: data.metadata?.title,
|
||||
finalUrl: data.metadata?.sourceURL,
|
||||
status: data.metadata?.statusCode,
|
||||
warning: payload?.warning,
|
||||
};
|
||||
const data = payload?.data ?? {};
|
||||
const rawText =
|
||||
typeof data.markdown === "string"
|
||||
? data.markdown
|
||||
: typeof data.content === "string"
|
||||
? data.content
|
||||
: "";
|
||||
const text = params.extractMode === "text" ? markdownToText(rawText) : rawText;
|
||||
return {
|
||||
text,
|
||||
title: data.metadata?.title,
|
||||
finalUrl: data.metadata?.sourceURL,
|
||||
status: data.metadata?.statusCode,
|
||||
warning: payload?.warning,
|
||||
};
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
type FirecrawlRuntimeParams = {
|
||||
@ -629,9 +635,19 @@ async function runWebFetch(params: WebFetchRuntimeParams): Promise<Record<string
|
||||
title = firecrawl.title;
|
||||
extractor = "firecrawl";
|
||||
} else {
|
||||
throw new Error(
|
||||
"Web fetch extraction failed: Readability and Firecrawl returned no content.",
|
||||
);
|
||||
const basic = await extractBasicHtmlContent({
|
||||
html: body,
|
||||
extractMode: params.extractMode,
|
||||
});
|
||||
if (basic?.text) {
|
||||
text = basic.text;
|
||||
title = basic.title;
|
||||
extractor = "raw-html";
|
||||
} else {
|
||||
throw new Error(
|
||||
"Web fetch extraction failed: Readability, Firecrawl, and basic HTML cleanup returned no content.",
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
@ -784,3 +800,7 @@ export function createWebFetchTool(options?: {
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export const __testing = {
|
||||
resolveFirecrawlBaseUrl,
|
||||
};
|
||||
|
||||
@ -3,6 +3,7 @@ import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
|
||||
import * as ssrf from "../../infra/net/ssrf.js";
|
||||
import { resolveRequestUrl } from "../../plugin-sdk/request-url.js";
|
||||
import { withFetchPreconnect } from "../../test-utils/fetch-mock.js";
|
||||
import { __testing as webFetchTesting } from "./web-fetch.js";
|
||||
import { makeFetchHeaders } from "./web-fetch.test-harness.js";
|
||||
import { createWebFetchTool } from "./web-tools.js";
|
||||
|
||||
@ -324,6 +325,40 @@ describe("web_fetch extraction fallbacks", () => {
|
||||
expect(authHeader).toBe("Bearer firecrawl-test-key");
|
||||
});
|
||||
|
||||
it("uses FIRECRAWL_BASE_URL env var when firecrawl.baseUrl is unset", async () => {
|
||||
vi.stubEnv("FIRECRAWL_BASE_URL", "https://fc.example.com");
|
||||
|
||||
expect(webFetchTesting.resolveFirecrawlBaseUrl({})).toBe("https://fc.example.com");
|
||||
});
|
||||
|
||||
it("uses guarded endpoint fetch for firecrawl requests", async () => {
|
||||
vi.stubEnv("HTTP_PROXY", "http://127.0.0.1:7890");
|
||||
|
||||
const fetchSpy = installMockFetch((input: RequestInfo | URL) => {
|
||||
const url = resolveRequestUrl(input);
|
||||
if (url.includes("api.firecrawl.dev/v2/scrape")) {
|
||||
return Promise.resolve(
|
||||
firecrawlResponse("firecrawl guarded transport"),
|
||||
) as Promise<Response>;
|
||||
}
|
||||
return Promise.resolve(
|
||||
htmlResponse("<!doctype html><html><head></head><body></body></html>", url),
|
||||
) as Promise<Response>;
|
||||
});
|
||||
|
||||
const tool = createFirecrawlTool();
|
||||
const result = await executeFetch(tool, { url: "https://example.com/guarded-firecrawl" });
|
||||
|
||||
expect(result?.details).toMatchObject({ extractor: "firecrawl" });
|
||||
const firecrawlCall = fetchSpy.mock.calls.find((call) =>
|
||||
resolveRequestUrl(call[0]).includes("/v2/scrape"),
|
||||
);
|
||||
expect(firecrawlCall).toBeTruthy();
|
||||
const requestInit = firecrawlCall?.[1] as (RequestInit & { dispatcher?: unknown }) | undefined;
|
||||
expect(requestInit?.dispatcher).toBeDefined();
|
||||
expect(requestInit?.dispatcher).toBeInstanceOf(EnvHttpProxyAgent);
|
||||
});
|
||||
|
||||
it("throws when readability is disabled and firecrawl is unavailable", async () => {
|
||||
installMockFetch(
|
||||
(input: RequestInfo | URL) =>
|
||||
@ -356,7 +391,29 @@ describe("web_fetch extraction fallbacks", () => {
|
||||
const tool = createFirecrawlTool();
|
||||
await expect(
|
||||
executeFetch(tool, { url: "https://example.com/readability-empty" }),
|
||||
).rejects.toThrow("Readability and Firecrawl returned no content");
|
||||
).rejects.toThrow("Readability, Firecrawl, and basic HTML cleanup returned no content");
|
||||
});
|
||||
|
||||
it("falls back to basic HTML cleanup after readability and before giving up", async () => {
|
||||
installMockFetch(
|
||||
(input: RequestInfo | URL) =>
|
||||
Promise.resolve(
|
||||
htmlResponse(
|
||||
"<!doctype html><html><head><title>Shell App</title></head><body><div id='app'></div></body></html>",
|
||||
resolveRequestUrl(input),
|
||||
),
|
||||
) as Promise<Response>,
|
||||
);
|
||||
|
||||
const tool = createFetchTool({
|
||||
firecrawl: { enabled: false },
|
||||
});
|
||||
const result = await executeFetch(tool, { url: "https://example.com/shell" });
|
||||
const details = result?.details as { extractor?: string; text?: string; title?: string };
|
||||
|
||||
expect(details.extractor).toBe("raw-html");
|
||||
expect(details.text).toContain("Shell App");
|
||||
expect(details.title).toContain("Shell App");
|
||||
});
|
||||
|
||||
it("uses firecrawl when direct fetch fails", async () => {
|
||||
|
||||
@ -116,6 +116,19 @@ describe("setupSearch", () => {
|
||||
expect(result.tools?.web?.search?.gemini?.apiKey).toBe("AIza-test");
|
||||
});
|
||||
|
||||
it("sets provider and key for firecrawl and enables the plugin", async () => {
|
||||
const cfg: OpenClawConfig = {};
|
||||
const { prompter } = createPrompter({
|
||||
selectValue: "firecrawl",
|
||||
textValue: "fc-test-key",
|
||||
});
|
||||
const result = await setupSearch(cfg, runtime, prompter);
|
||||
expect(result.tools?.web?.search?.provider).toBe("firecrawl");
|
||||
expect(result.tools?.web?.search?.enabled).toBe(true);
|
||||
expect(result.tools?.web?.search?.firecrawl?.apiKey).toBe("fc-test-key");
|
||||
expect(result.plugins?.entries?.firecrawl?.enabled).toBe(true);
|
||||
});
|
||||
|
||||
it("sets provider and key for grok", async () => {
|
||||
const cfg: OpenClawConfig = {};
|
||||
const { prompter } = createPrompter({
|
||||
@ -331,9 +344,9 @@ describe("setupSearch", () => {
|
||||
expect(result.tools?.web?.search?.apiKey).toBe("BSA-plain");
|
||||
});
|
||||
|
||||
it("exports all 5 providers in SEARCH_PROVIDER_OPTIONS", () => {
|
||||
expect(SEARCH_PROVIDER_OPTIONS).toHaveLength(5);
|
||||
it("exports all 6 providers in SEARCH_PROVIDER_OPTIONS", () => {
|
||||
expect(SEARCH_PROVIDER_OPTIONS).toHaveLength(6);
|
||||
const values = SEARCH_PROVIDER_OPTIONS.map((e) => e.value);
|
||||
expect(values).toEqual(["brave", "gemini", "grok", "kimi", "perplexity"]);
|
||||
expect(values).toEqual(["brave", "gemini", "grok", "kimi", "perplexity", "firecrawl"]);
|
||||
});
|
||||
});
|
||||
|
||||
@ -6,6 +6,7 @@ import {
|
||||
hasConfiguredSecretInput,
|
||||
normalizeSecretInputString,
|
||||
} from "../config/types.secrets.js";
|
||||
import { enablePluginInConfig } from "../plugins/enable.js";
|
||||
import { resolvePluginWebSearchProviders } from "../plugins/web-search-providers.js";
|
||||
import type { RuntimeEnv } from "../runtime.js";
|
||||
import type { WizardPrompter } from "../wizard/prompts.js";
|
||||
@ -15,7 +16,7 @@ export type SearchProvider = NonNullable<
|
||||
NonNullable<NonNullable<NonNullable<OpenClawConfig["tools"]>["web"]>["search"]>["provider"]
|
||||
>;
|
||||
|
||||
const SEARCH_PROVIDER_IDS = ["brave", "gemini", "grok", "kimi", "perplexity"] as const;
|
||||
const SEARCH_PROVIDER_IDS = ["brave", "firecrawl", "gemini", "grok", "kimi", "perplexity"] as const;
|
||||
|
||||
function isSearchProvider(value: string): value is SearchProvider {
|
||||
return (SEARCH_PROVIDER_IDS as readonly string[]).includes(value);
|
||||
@ -114,17 +115,21 @@ export function applySearchKey(
|
||||
if (entry) {
|
||||
entry.setCredentialValue(search as Record<string, unknown>, key);
|
||||
}
|
||||
return {
|
||||
const next = {
|
||||
...config,
|
||||
tools: {
|
||||
...config.tools,
|
||||
web: { ...config.tools?.web, search },
|
||||
},
|
||||
};
|
||||
if (provider !== "firecrawl") {
|
||||
return next;
|
||||
}
|
||||
return enablePluginInConfig(next, "firecrawl").config;
|
||||
}
|
||||
|
||||
function applyProviderOnly(config: OpenClawConfig, provider: SearchProvider): OpenClawConfig {
|
||||
return {
|
||||
const next = {
|
||||
...config,
|
||||
tools: {
|
||||
...config.tools,
|
||||
@ -138,6 +143,10 @@ function applyProviderOnly(config: OpenClawConfig, provider: SearchProvider): Op
|
||||
},
|
||||
},
|
||||
};
|
||||
if (provider !== "firecrawl") {
|
||||
return next;
|
||||
}
|
||||
return enablePluginInConfig(next, "firecrawl").config;
|
||||
}
|
||||
|
||||
function preserveDisabledState(original: OpenClawConfig, result: OpenClawConfig): OpenClawConfig {
|
||||
|
||||
@ -16,6 +16,11 @@ vi.mock("../plugins/web-search-providers.js", () => {
|
||||
envVars: ["BRAVE_API_KEY"],
|
||||
getCredentialValue: (search?: Record<string, unknown>) => search?.apiKey,
|
||||
},
|
||||
{
|
||||
id: "firecrawl",
|
||||
envVars: ["FIRECRAWL_API_KEY"],
|
||||
getCredentialValue: getScoped("firecrawl"),
|
||||
},
|
||||
{
|
||||
id: "gemini",
|
||||
envVars: ["GEMINI_API_KEY"],
|
||||
@ -75,6 +80,21 @@ describe("web search provider config", () => {
|
||||
expect(res.ok).toBe(true);
|
||||
});
|
||||
|
||||
it("accepts firecrawl provider and config", () => {
|
||||
const res = validateConfigObject(
|
||||
buildWebSearchProviderConfig({
|
||||
enabled: true,
|
||||
provider: "firecrawl",
|
||||
providerConfig: {
|
||||
apiKey: "fc-test-key", // pragma: allowlist secret
|
||||
baseUrl: "https://api.firecrawl.dev",
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
expect(res.ok).toBe(true);
|
||||
});
|
||||
|
||||
it("accepts gemini provider with no extra config", () => {
|
||||
const res = validateConfigObject(
|
||||
buildWebSearchProviderConfig({
|
||||
@ -117,6 +137,7 @@ describe("web search provider auto-detection", () => {
|
||||
|
||||
beforeEach(() => {
|
||||
delete process.env.BRAVE_API_KEY;
|
||||
delete process.env.FIRECRAWL_API_KEY;
|
||||
delete process.env.GEMINI_API_KEY;
|
||||
delete process.env.KIMI_API_KEY;
|
||||
delete process.env.MOONSHOT_API_KEY;
|
||||
@ -146,6 +167,11 @@ describe("web search provider auto-detection", () => {
|
||||
expect(resolveSearchProvider({})).toBe("gemini");
|
||||
});
|
||||
|
||||
it("auto-detects firecrawl when only FIRECRAWL_API_KEY is set", () => {
|
||||
process.env.FIRECRAWL_API_KEY = "fc-test-key"; // pragma: allowlist secret
|
||||
expect(resolveSearchProvider({})).toBe("firecrawl");
|
||||
});
|
||||
|
||||
it("auto-detects kimi when only KIMI_API_KEY is set", () => {
|
||||
process.env.KIMI_API_KEY = "test-kimi-key"; // pragma: allowlist secret
|
||||
expect(resolveSearchProvider({})).toBe("kimi");
|
||||
|
||||
@ -665,13 +665,17 @@ export const FIELD_HELP: Record<string, string> = {
|
||||
"tools.message.broadcast.enabled": "Enable broadcast action (default: true).",
|
||||
"tools.web.search.enabled": "Enable the web_search tool (requires a provider API key).",
|
||||
"tools.web.search.provider":
|
||||
'Search provider ("brave", "gemini", "grok", "kimi", or "perplexity"). Auto-detected from available API keys if omitted.',
|
||||
'Search provider ("brave", "firecrawl", "gemini", "grok", "kimi", or "perplexity"). Auto-detected from available API keys if omitted.',
|
||||
"tools.web.search.apiKey": "Brave Search API key (fallback: BRAVE_API_KEY env var).",
|
||||
"tools.web.search.maxResults": "Number of results to return (1-10).",
|
||||
"tools.web.search.timeoutSeconds": "Timeout in seconds for web_search requests.",
|
||||
"tools.web.search.cacheTtlMinutes": "Cache TTL in minutes for web_search results.",
|
||||
"tools.web.search.brave.mode":
|
||||
'Brave Search mode: "web" (URL results) or "llm-context" (pre-extracted page content for LLM grounding).',
|
||||
"tools.web.search.firecrawl.apiKey":
|
||||
"Firecrawl API key for web search (fallback: FIRECRAWL_API_KEY env var).",
|
||||
"tools.web.search.firecrawl.baseUrl":
|
||||
'Firecrawl Search base URL override (default: "https://api.firecrawl.dev").',
|
||||
"tools.web.search.gemini.apiKey":
|
||||
"Gemini API key for Google Search grounding (fallback: GEMINI_API_KEY env var).",
|
||||
"tools.web.search.gemini.model": 'Gemini model override (default: "gemini-2.5-flash").',
|
||||
|
||||
@ -221,6 +221,8 @@ export const FIELD_LABELS: Record<string, string> = {
|
||||
"tools.web.search.timeoutSeconds": "Web Search Timeout (sec)",
|
||||
"tools.web.search.cacheTtlMinutes": "Web Search Cache TTL (min)",
|
||||
"tools.web.search.brave.mode": "Brave Search Mode",
|
||||
"tools.web.search.firecrawl.apiKey": "Firecrawl Search API Key", // pragma: allowlist secret
|
||||
"tools.web.search.firecrawl.baseUrl": "Firecrawl Search Base URL",
|
||||
"tools.web.search.gemini.apiKey": "Gemini Search API Key", // pragma: allowlist secret
|
||||
"tools.web.search.gemini.model": "Gemini Search Model",
|
||||
"tools.web.search.grok.apiKey": "Grok Search API Key", // pragma: allowlist secret
|
||||
|
||||
@ -457,8 +457,8 @@ export type ToolsConfig = {
|
||||
search?: {
|
||||
/** Enable web search tool (default: true when API key is present). */
|
||||
enabled?: boolean;
|
||||
/** Search provider ("brave", "gemini", "grok", "kimi", or "perplexity"). */
|
||||
provider?: "brave" | "gemini" | "grok" | "kimi" | "perplexity";
|
||||
/** Search provider ("brave", "firecrawl", "gemini", "grok", "kimi", or "perplexity"). */
|
||||
provider?: "brave" | "firecrawl" | "gemini" | "grok" | "kimi" | "perplexity";
|
||||
/** Brave Search API key (optional; defaults to BRAVE_API_KEY env var). */
|
||||
apiKey?: SecretInput;
|
||||
/** Default search results count (1-10). */
|
||||
@ -479,6 +479,13 @@ export type ToolsConfig = {
|
||||
/** Model to use for grounded search (defaults to "gemini-2.5-flash"). */
|
||||
model?: string;
|
||||
};
|
||||
/** Firecrawl-specific configuration (used when provider="firecrawl"). */
|
||||
firecrawl?: {
|
||||
/** Firecrawl API key (defaults to FIRECRAWL_API_KEY env var). */
|
||||
apiKey?: SecretInput;
|
||||
/** Base URL for API requests (defaults to "https://api.firecrawl.dev"). */
|
||||
baseUrl?: string;
|
||||
};
|
||||
/** Grok-specific configuration (used when provider="grok"). */
|
||||
grok?: {
|
||||
/** API key for xAI (defaults to XAI_API_KEY env var). */
|
||||
|
||||
@ -266,6 +266,7 @@ export const ToolsWebSearchSchema = z
|
||||
provider: z
|
||||
.union([
|
||||
z.literal("brave"),
|
||||
z.literal("firecrawl"),
|
||||
z.literal("perplexity"),
|
||||
z.literal("grok"),
|
||||
z.literal("gemini"),
|
||||
@ -301,6 +302,13 @@ export const ToolsWebSearchSchema = z
|
||||
})
|
||||
.strict()
|
||||
.optional(),
|
||||
firecrawl: z
|
||||
.object({
|
||||
apiKey: SecretInputSchema.optional().register(sensitive),
|
||||
baseUrl: z.string().optional(),
|
||||
})
|
||||
.strict()
|
||||
.optional(),
|
||||
kimi: z
|
||||
.object({
|
||||
apiKey: SecretInputSchema.optional().register(sensitive),
|
||||
|
||||
@ -96,6 +96,7 @@ describe("resolvePluginWebSearchProviders", () => {
|
||||
entries: expect.objectContaining({
|
||||
openrouter: { enabled: true },
|
||||
brave: { enabled: true },
|
||||
firecrawl: { enabled: true },
|
||||
google: { enabled: true },
|
||||
moonshot: { enabled: true },
|
||||
perplexity: { enabled: true },
|
||||
|
||||
@ -11,6 +11,7 @@ const log = createSubsystemLogger("plugins");
|
||||
|
||||
const BUNDLED_WEB_SEARCH_ALLOWLIST_COMPAT_PLUGIN_IDS = [
|
||||
"brave",
|
||||
"firecrawl",
|
||||
"google",
|
||||
"moonshot",
|
||||
"perplexity",
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user