2026-01-12 05:28:17 +00:00
---
2026-01-27 21:57:15 -08:00
summary: "How Moltbot memory works (workspace files + automatic memory flush)"
2026-01-12 05:28:17 +00:00
read_when:
- You want the memory file layout and workflow
- You want to tune the automatic pre-compaction memory flush
---
# Memory
2026-01-27 21:57:15 -08:00
Moltbot memory is **plain Markdown in the agent workspace** . The files are the
2026-01-12 06:33:14 +00:00
source of truth; the model only "remembers" what gets written to disk.
2026-01-12 05:28:17 +00:00
2026-01-18 02:12:01 +00:00
Memory search tools are provided by the active memory plugin (default:
`memory-core` ). Disable memory plugins with `plugins.slots.memory = "none"` .
2026-01-12 05:28:17 +00:00
## Memory files (Markdown)
The default workspace layout uses two memory layers:
- `memory/YYYY-MM-DD.md`
- Daily log (append-only).
- Read today + yesterday at session start.
- `MEMORY.md` (optional)
- Curated long-term memory.
- **Only load in the main, private session** (never in group contexts).
These files live under the workspace (`agents.defaults.workspace` , default
2026-01-27 21:57:15 -08:00
`~/clawd` ). See [Agent workspace ](/concepts/agent-workspace ) for the full layout.
2026-01-12 05:28:17 +00:00
## When to write memory
- Decisions, preferences, and durable facts go to `MEMORY.md` .
- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md` .
2026-01-12 06:33:14 +00:00
- If someone says "remember this," write it down (do not keep it in RAM).
2026-01-25 04:33:14 +00:00
- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
2026-01-25 04:04:14 +00:00
- If you want something to stick, **ask the bot to write it** into memory.
2026-01-12 05:28:17 +00:00
## Automatic memory flush (pre-compaction ping)
2026-01-27 21:57:15 -08:00
When a session is **close to auto-compaction** , Moltbot triggers a **silent,
2026-01-12 05:28:17 +00:00
agentic turn** that reminds the model to write durable memory **before** the
2026-01-27 21:57:15 -08:00
context is compacted. The default prompts explicitly say the model *may reply* ,
2026-01-12 07:39:44 +00:00
but usually `NO_REPLY` is the correct response so the user never sees this turn.
2026-01-12 05:28:17 +00:00
This is controlled by `agents.defaults.compaction.memoryFlush` :
```json5
{
agents: {
defaults: {
compaction: {
reserveTokensFloor: 20000,
memoryFlush: {
enabled: true,
softThresholdTokens: 4000,
systemPrompt: "Session nearing compaction. Store durable memories now.",
2026-01-27 21:57:15 -08:00
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
}
}
}
}
2026-01-12 05:28:17 +00:00
}
```
Details:
- **Soft threshold**: flush triggers when the session token estimate crosses
`contextWindow - reserveTokensFloor - softThresholdTokens` .
- **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
2026-01-12 07:39:44 +00:00
- **Two prompts**: a user prompt plus a system prompt append the reminder.
2026-01-12 05:28:17 +00:00
- **One flush per compaction cycle** (tracked in `sessions.json` ).
2026-01-12 06:33:14 +00:00
- **Workspace must be writable**: if the session runs sandboxed with
`workspaceAccess: "ro"` or `"none"` , the flush is skipped.
2026-01-12 05:28:17 +00:00
For the full compaction lifecycle, see
[Session management + compaction ](/reference/session-management-compaction ).
2026-01-12 11:22:56 +00:00
## Vector memory search
2026-01-27 21:57:15 -08:00
Moltbot can build a small vector index over `MEMORY.md` and `memory/*.md` so
semantic queries can find related notes even when wording differs.
2026-01-12 11:22:56 +00:00
Defaults:
- Enabled by default.
- Watches memory files for changes (debounced).
2026-01-27 21:57:15 -08:00
- Uses remote embeddings by default. If `memorySearch.provider` is not set, Moltbot auto-selects:
2026-01-18 15:29:16 +00:00
1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
2. `openai` if an OpenAI key can be resolved.
3. `gemini` if a Gemini key can be resolved.
4. Otherwise memory search stays disabled until configured.
2026-01-12 11:22:56 +00:00
- Local mode uses node-llama-cpp and may require `pnpm approve-builds` .
2026-01-17 18:02:25 +00:00
- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
2026-01-12 11:22:56 +00:00
2026-01-27 21:57:15 -08:00
Remote embeddings **require** an API key for the embedding provider. Moltbot
2026-01-18 15:29:16 +00:00
resolves keys from auth profiles, `models.providers.*.apiKey` , or environment
variables. Codex OAuth only covers chat/completions and does **not** satisfy
embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
`models.providers.google.apiKey` . When using a custom OpenAI-compatible endpoint,
set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers` ).
2026-01-12 15:33:35 +00:00
2026-01-27 21:57:15 -08:00
### QMD backend (experimental)
Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for
[QMD ](https://github.com/tobi/qmd ): a local-first search sidecar that combines
BM25 + vectors + reranking. Markdown stays the source of truth; Moltbot shells
out to QMD for retrieval. Key points:
**Prereqs**
- Disabled by default. Opt in per-config (`memory.backend = "qmd"` ).
- Install the QMD CLI separately (`bun install -g github.com/tobi/qmd` or grab
a release) and make sure the `qmd` binary is on the gateway’ s `PATH` .
- QMD needs an SQLite build that allows extensions (`brew install sqlite` on
macOS). The gateway sets `INDEX_PATH` /`QMD_CONFIG_DIR` automatically.
**How the sidecar runs**
- The gateway writes a self-contained QMD home under
`~/.clawdbot/agents/<agentId>/qmd/` (config + cache + sqlite DB).
- Collections are rewritten from `memory.qmd.paths` (plus default workspace
memory files) into `index.yml` , then `qmd update` + `qmd embed` run on boot and
on a configurable interval (`memory.qmd.update.interval` , default 5 m).
- Searches run via `qmd query --json` . If QMD fails or the binary is missing,
Moltbot automatically falls back to the builtin SQLite manager so memory tools
keep working.
**Config surface (`memory.qmd.*` )**
- `command` (default `qmd` ): override the executable path.
- `includeDefaultMemory` (default `true` ): auto-index `MEMORY.md` + `memory/**/*.md` .
- `paths[]` : add extra directories/files (`path` , optional `pattern` , optional
stable `name` ).
- `sessions` : opt into session JSONL indexing (`enabled` , `retentionDays` ,
2026-01-27 22:17:56 -08:00
`exportDir` ).
2026-01-27 21:57:15 -08:00
- `update` : controls refresh cadence (`interval` , `debounceMs` , `onBoot` ).
- `limits` : clamp recall payload (`maxResults` , `maxSnippetChars` ,
`maxInjectedChars` , `timeoutMs` ).
- `scope` : same schema as [`session.sendPolicy` ](/reference/configuration#session-sendpolicy ).
Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD
hits in groups/channels.
- Snippets sourced outside the workspace show up as
`qmd/<collection>/<relative-path>` in `memory_search` results; `memory_get`
understands that prefix and reads from the configured QMD collection root.
- When `memory.qmd.sessions.enabled = true` , Moltbot exports sanitized session
transcripts (User/Assistant turns) into a dedicated QMD collection under
`~/.clawdbot/agents/<id>/qmd/sessions/` , so `memory_search` can recall recent
conversations without touching the builtin SQLite index.
- `memory_search` snippets now include a `Source: <path#line>` footer when
`memory.citations` is `auto` /`on` ; set `memory.citations = "off"` to keep
the path metadata internal (the agent still receives the path for
`memory_get` , but the snippet text omits the footer and the system prompt
warns the agent not to cite it).
**Example**
```json5
memory: {
backend: "qmd",
citations: "auto",
qmd: {
includeDefaultMemory: true,
update: { interval: "5m", debounceMs: 15000 },
limits: { maxResults: 6, timeoutMs: 4000 },
scope: {
default: "deny",
rules: [{ action: "allow", match: { chatType: "direct" } }]
},
paths: [
{ name: "docs", path: "~/notes", pattern: "**/*.md" }
]
}
}
```
**Citations & fallback**
- `memory.citations` applies regardless of backend (`auto` /`on` /`off` ).
- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which
engine served the results. If the QMD subprocess exits or JSON output can’ t be
parsed, the search manager logs a warning and returns the builtin provider
(existing Markdown embeddings) until QMD recovers.
2026-01-28 21:49:38 -05:00
### Additional memory paths
If you want to index Markdown files outside the default workspace layout, add
explicit paths:
```json5
agents: {
defaults: {
memorySearch: {
extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"]
}
}
}
```
Notes:
2026-01-31 21:13:13 +09:00
2026-01-28 21:49:38 -05:00
- Paths can be absolute or workspace-relative.
- Directories are scanned recursively for `.md` files.
- Only Markdown files are indexed.
- Symlinks are ignored (files or directories).
2026-01-18 15:29:16 +00:00
### Gemini embeddings (native)
Set the provider to `gemini` to use the Gemini embeddings API directly:
2026-01-18 09:09:13 +00:00
```json5
agents: {
defaults: {
memorySearch: {
provider: "gemini",
2026-01-18 15:29:16 +00:00
model: "gemini-embedding-001",
2026-01-18 09:09:13 +00:00
remote: {
2026-01-18 15:29:16 +00:00
apiKey: "YOUR_GEMINI_API_KEY"
2026-01-18 09:09:13 +00:00
}
}
}
}
```
2026-01-18 15:29:16 +00:00
Notes:
- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
- `remote.headers` lets you add extra headers if needed.
- Default model: `gemini-embedding-001` .
2026-01-18 09:09:13 +00:00
2026-01-18 15:29:16 +00:00
If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy),
you can use the `remote` configuration with the OpenAI provider:
2026-01-12 15:33:35 +00:00
```json5
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "text-embedding-3-small",
remote: {
2026-01-18 15:29:16 +00:00
baseUrl: "https://api.example.com/v1/",
apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
2026-01-12 15:33:35 +00:00
headers: { "X-Custom-Header": "value" }
}
}
}
}
```
If you don't want to set an API key, use `memorySearch.provider = "local"` or set
2026-01-13 00:25:54 +00:00
`memorySearch.fallback = "none"` .
2026-01-18 15:29:16 +00:00
Fallbacks:
- `memorySearch.fallback` can be `openai` , `gemini` , `local` , or `none` .
- The fallback provider is only used when the primary embedding provider fails.
Batch indexing (OpenAI + Gemini):
- Enabled by default for OpenAI and Gemini embeddings. Set `agents.defaults.memorySearch.remote.batch.enabled = false` to disable.
2026-01-17 22:31:12 +00:00
- Default behavior waits for batch completion; tune `remote.batch.wait` , `remote.batch.pollIntervalMs` , and `remote.batch.timeoutMinutes` if needed.
2026-01-18 01:24:16 +00:00
- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
2026-01-18 15:29:16 +00:00
- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
2026-01-17 22:31:12 +00:00
2026-01-18 01:24:16 +00:00
Why OpenAI batch is fast + cheap:
- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
- See the OpenAI Batch API docs and pricing for details:
- https://platform.openai.com/docs/api-reference/batch
- https://platform.openai.com/pricing
2026-01-12 11:22:56 +00:00
Config example:
```json5
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "text-embedding-3-small",
fallback: "openai",
2026-01-17 22:31:12 +00:00
remote: {
2026-01-18 01:24:16 +00:00
batch: { enabled: true, concurrency: 2 }
2026-01-17 22:31:12 +00:00
},
2026-01-12 11:22:56 +00:00
sync: { watch: true }
}
}
}
```
Tools:
- `memory_search` — returns snippets with file + line ranges.
- `memory_get` — read memory file content by path.
Local mode:
- Set `agents.defaults.memorySearch.provider = "local"` .
- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
2026-01-12 22:35:19 +00:00
### How the memory tools work
- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md` . It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
2026-01-27 21:57:15 -08:00
- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
2026-01-12 22:35:19 +00:00
- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
### What gets indexed (and when)
2026-01-27 21:57:15 -08:00
- File type: Markdown only (`MEMORY.md` , `memory/**/*.md` ).
- Index storage: per-agent SQLite at `~/.clawdbot/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path` , supports `{agentId}` token).
- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params** . If any of those change, Moltbot automatically resets and reindexes the entire store.
2026-01-18 01:35:58 +00:00
2026-01-18 01:42:25 +00:00
### Hybrid search (BM25 + vector)
2026-01-27 21:57:15 -08:00
When enabled, Moltbot combines:
2026-01-18 01:42:25 +00:00
- **Vector similarity** (semantic match, wording can differ)
- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)
2026-01-27 21:57:15 -08:00
If full-text search is unavailable on your platform, Moltbot falls back to vector-only search.
2026-01-18 01:42:25 +00:00
2026-01-18 03:09:39 +00:00
#### Why hybrid?
Vector search is great at “this means the same thing”:
- “Mac Studio gateway host” vs “the machine running the gateway”
- “debounce file updates” vs “avoid indexing on every write”
But it can be weak at exact, high-signal tokens:
- IDs (`a828e60` , `b3b9895a…` )
- code symbols (`memorySearch.query.hybrid` )
- error strings (“sqlite-vec unavailable”)
BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
good results for both “natural language” queries and “needle in a haystack” queries.
#### How we merge results (the current design)
Implementation sketch:
2026-01-27 21:57:15 -08:00
1) Retrieve a candidate pool from both sides:
2026-01-18 03:09:39 +00:00
- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
2026-01-27 21:57:15 -08:00
2) Convert BM25 rank into a 0..1-ish score:
2026-01-18 03:09:39 +00:00
- `textScore = 1 / (1 + max(0, bm25Rank))`
2026-01-27 21:57:15 -08:00
3) Union candidates by chunk id and compute a weighted score:
2026-01-18 03:09:39 +00:00
- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
Notes:
- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
- If FTS5 can’ t be created, we keep vector-only search (no hard failure).
This isn’ t “IR-theory perfect”, but it’ s simple, fast, and tends to improve recall/precision on real notes.
If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
(min/max or z-score) before mixing.
2026-01-18 01:42:25 +00:00
Config:
```json5
agents: {
defaults: {
memorySearch: {
query: {
hybrid: {
enabled: true,
vectorWeight: 0.7,
textWeight: 0.3,
candidateMultiplier: 4
}
}
}
}
}
```
2026-01-18 01:35:58 +00:00
### Embedding cache
2026-01-27 21:57:15 -08:00
Moltbot can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
2026-01-18 01:35:58 +00:00
Config:
```json5
agents: {
defaults: {
memorySearch: {
cache: {
enabled: true,
maxEntries: 50000
}
}
}
}
```
2026-01-12 22:35:19 +00:00
2026-01-17 18:53:48 +00:00
### Session memory search (experimental)
You can optionally index **session transcripts** and surface them via `memory_search` .
This is gated behind an experimental flag.
```json5
agents: {
defaults: {
memorySearch: {
experimental: { sessionMemory: true },
sources: ["memory", "sessions"]
}
}
}
```
Notes:
- Session indexing is **opt-in** (off by default).
2026-01-21 10:37:52 +00:00
- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort).
- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
2026-01-17 18:53:48 +00:00
- Results still include snippets only; `memory_get` remains limited to memory files.
- Session indexing is isolated per agent (only that agent’ s session logs are indexed).
2026-01-27 21:57:15 -08:00
- Session logs live on disk (`~/.clawdbot/agents/<agentId>/sessions/*.jsonl` ). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
2026-01-17 18:53:48 +00:00
2026-01-21 10:37:52 +00:00
Delta thresholds (defaults shown):
```json5
agents: {
defaults: {
memorySearch: {
sync: {
sessions: {
deltaBytes: 100000, // ~100 KB
deltaMessages: 50 // JSONL lines
}
}
}
}
}
```
2026-01-17 18:02:25 +00:00
### SQLite vector acceleration (sqlite-vec)
2026-01-27 21:57:15 -08:00
When the sqlite-vec extension is available, Moltbot stores embeddings in a
2026-01-17 18:02:25 +00:00
SQLite virtual table (`vec0` ) and performs vector distance queries in the
database. This keeps search fast without loading every embedding into JS.
Configuration (optional):
```json5
agents: {
defaults: {
memorySearch: {
store: {
vector: {
enabled: true,
extensionPath: "/path/to/sqlite-vec"
}
}
}
}
}
```
Notes:
- `enabled` defaults to true; when disabled, search falls back to in-process
cosine similarity over stored embeddings.
2026-01-27 21:57:15 -08:00
- If the sqlite-vec extension is missing or fails to load, Moltbot logs the
2026-01-17 18:02:25 +00:00
error and continues with the JS fallback (no vector table).
- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
or non-standard install locations).
2026-01-12 22:35:19 +00:00
### Local embedding auto-download
- Default local embedding model: `hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` (~0.6 GB).
- When `memorySearch.provider = "local"` , `node-llama-cpp` resolves `modelPath` ; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
- Native build requirement: run `pnpm approve-builds` , pick `node-llama-cpp` , then `pnpm rebuild node-llama-cpp` .
- Fallback: if local setup fails and `memorySearch.fallback = "openai"` , we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
2026-01-13 03:21:45 +00:00
### Custom OpenAI-compatible endpoint example
```json5
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "text-embedding-3-small",
remote: {
baseUrl: "https://api.example.com/v1/",
apiKey: "YOUR_REMOTE_API_KEY",
headers: {
"X-Organization": "org-id",
"X-Project": "project-id"
}
}
}
}
}
```
Notes:
- `remote.*` takes precedence over `models.providers.openai.*` .
- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.