When echoTranscript is enabled in tools.media.audio config, the
transcription text is sent back to the originating chat immediately
after successful audio transcription — before the agent processes it.
This lets users verify what was heard from their voice note.
Changes:
- config/types.tools.ts: add echoTranscript (bool) and echoFormat
(string template) to MediaUnderstandingConfig
- media-understanding/apply.ts: sendTranscriptEcho() helper that
resolves channel/to from ctx, guards on isDeliverableMessageChannel,
and calls deliverOutboundPayloads best-effort
- config/schema.help.ts: help text for both new fields
- config/schema.labels.ts: labels for both new fields
- media-understanding/apply.echo-transcript.test.ts: 10 vitest cases
covering disabled/enabled/custom-format/no-audio/failed-transcription/
non-deliverable-channel/missing-from/OriginatingTo/delivery-failure
Default echoFormat: '📝 "{transcript}"'
Closes#32102
* feat: add PDF analysis tool with native provider support
New `pdf` tool for analyzing PDF documents with model-powered analysis.
Architecture:
- Native PDF path: sends raw PDF bytes directly to providers that support
inline document input (Anthropic via DocumentBlockParam, Google Gemini
via inlineData with application/pdf MIME type)
- Extraction fallback: for providers without native PDF support, extracts
text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas,
then sends through the standard vision/text completion path
Key features:
- Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10)
- Page range selection (`pages` param, e.g. "1-5", "1,3,7-9")
- Model override (`model` param) and file size limits (`maxBytesMb`)
- Auto-detects provider capability and falls back gracefully
- Same security patterns as image tool (SSRF guards, sandbox support,
local path roots, workspace-only policy)
Config (agents.defaults):
- pdfModel: primary/fallbacks (defaults to imageModel, then session model)
- pdfMaxBytesMb: max PDF file size (default: 10)
- pdfMaxPages: max pages to process (default: 20)
Model catalog:
- Extended ModelInputType to include "document" alongside "text"/"image"
- Added modelSupportsDocument() capability check
Files:
- src/agents/tools/pdf-tool.ts - main tool factory
- src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.)
- src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google
- src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths
- Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help
* fix: prepare pdf tool for merge (#31319) (thanks @tyler6204)
Lands #26912 from @markshields-tl with configurable session.parentForkMaxTokens and docs/tests/changelog updates.
Co-authored-by: Mark Shields <239231357+markshields-tl@users.noreply.github.com>
Reimplements and consolidates related work:
- #24339 stale disconnect/destroyed session guards
- #25312 voice listener cleanup on stop
- #23036 restore @snazzah/davey runtime dependency
Adds Discord voice DAVE config passthrough, repeated decrypt failure
rejoin recovery, regression tests, docs, and changelog updates.
Co-authored-by: Frank Yang <frank.ekn@gmail.com>
Co-authored-by: Do Cao Hieu <admin@docaohieu.com>
* feat: add Gemini (Google Search grounding) as web_search provider
Add Gemini as a fourth web search provider alongside Brave, Perplexity,
and Grok. Uses Gemini's built-in Google Search grounding tool to return
search results with citations.
- Add runGeminiSearch() with Google Search grounding via tools API
- Resolve Gemini's grounding redirect URLs to direct URLs via parallel
HEAD requests (5s timeout, graceful fallback)
- Add Gemini config block (apiKey, model) with env var fallback
- Default model: gemini-2.5-flash (fast, cheap, grounding-capable)
- Strip API key from error messages for security
- Add config validation tests for Gemini provider
- Update docs/tools/web.md with Gemini provider documentation
Closes#13074
* feat: auto-detect search provider from available API keys
When no explicit provider is configured, resolveSearchProvider now
checks for available API keys in priority order (Brave → Gemini →
Perplexity → Grok) and selects the first provider with a valid key.
- Add auto-detection logic using existing resolve*ApiKey functions
- Export resolveSearchProvider via __testing_provider for tests
- Add 8 tests covering auto-detection, priority order, and explicit override
- Update docs/tools/web.md with auto-detection documentation
* fix: merge __testing exports, downgrade auto-detect log to debug
* fix: use defaultRuntime.log instead of .debug (not in RuntimeEnv type)
* fix: mark gemini apiKey as sensitive in zod schema
* fix: address Greptile review — add externalContent to Gemini payload, add Gemini/Grok entries to schema labels/help, remove dead schema-fields.ts
* fix(web-search): add JSON parse guard for Gemini API responses
Addresses Greptile review comment: add try/catch to handle non-JSON
responses from Gemini API gracefully, preventing runtime errors on
malformed responses.
Note: FIELD_HELP entries for gemini.apiKey and gemini.model were
already present in schema.help.ts, and gemini.apiKey was already
marked as sensitive in zod-schema.agent-runtime.ts (both fixed in
earlier commits).
* fix: use structured readResponseText result in Gemini error path
readResponseText returns { text, truncated, bytesRead }, not a string.
The Gemini error handler was using the result object directly, which
would always be truthy and never fall through to res.statusText.
Align with Perplexity/xAI/Brave error patterns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix import order and formatting after rebase onto main
* Web search: send Gemini API key via header
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
* Config UI: add tag filters and complete schema help/labels
* Config UI: finalize tags/help polish and unblock test suite
* Protocol: regenerate Swift gateway models