Tests: Add tooling / skill for detecting and fixing memory leaks in tests (#50654)
* Tests: add periodic heap snapshot tooling * Skills: add test heap leak workflow * Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update scripts/test-parallel.mjs Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
This commit is contained in:
parent
da8fb70525
commit
bbd62469fa
71
.agents/skills/openclaw-test-heap-leaks/SKILL.md
Normal file
71
.agents/skills/openclaw-test-heap-leaks/SKILL.md
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
---
|
||||||
|
name: openclaw-test-heap-leaks
|
||||||
|
description: Investigate `pnpm test` memory growth, Vitest worker OOMs, and suspicious RSS increases in OpenClaw using the `scripts/test-parallel.mjs` heap snapshot tooling. Use when Codex needs to reproduce test-lane memory growth, collect repeated `.heapsnapshot` files, compare snapshots from the same worker PID, distinguish transformed-module retention from real data leaks, and fix or reduce the impact by patching cleanup logic or isolating hotspot tests.
|
||||||
|
---
|
||||||
|
|
||||||
|
# OpenClaw Test Heap Leaks
|
||||||
|
|
||||||
|
Use this skill for test-memory investigations. Do not guess from RSS alone when heap snapshots are available.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
1. Reproduce the failing shape first.
|
||||||
|
- Match the real entrypoint if possible. For Linux CI-style unit failures, start with:
|
||||||
|
- `pnpm canvas:a2ui:bundle && OPENCLAW_TEST_MEMORY_TRACE=1 OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap OPENCLAW_TEST_WORKERS=2 OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 pnpm test`
|
||||||
|
- Keep `OPENCLAW_TEST_MEMORY_TRACE=1` enabled so the wrapper prints per-file RSS summaries alongside the snapshots.
|
||||||
|
- If the report is about a specific shard or worker budget, preserve that shape.
|
||||||
|
|
||||||
|
2. Wait for repeated snapshots before concluding anything.
|
||||||
|
- Take at least two intervals from the same lane.
|
||||||
|
- Compare snapshots from the same PID inside one lane directory such as `.tmp/heapsnap/unit-fast/`.
|
||||||
|
- Use `scripts/heapsnapshot-delta.mjs` to compare either two files directly or the earliest/latest pair per PID in one lane directory.
|
||||||
|
|
||||||
|
3. Classify the growth before choosing a fix.
|
||||||
|
- If growth is dominated by Vite/Vitest transformed source strings, `Module`, `system / Context`, bytecode, descriptor arrays, or property maps, treat it as retained module graph growth in long-lived workers.
|
||||||
|
- If growth is dominated by app objects, caches, buffers, server handles, timers, mock state, sqlite state, or similar runtime objects, treat it as a likely cleanup or lifecycle leak.
|
||||||
|
|
||||||
|
4. Fix the right layer.
|
||||||
|
- For retained transformed-module growth in shared workers:
|
||||||
|
- Move hotspot files out of `unit-fast` by updating `test/fixtures/test-parallel.behavior.json`.
|
||||||
|
- Prefer `singletonIsolated` for files that are safe alone but inflate shared worker heaps.
|
||||||
|
- If the file should already have been peeled out by timings but is absent from `test/fixtures/test-timings.unit.json`, call that out explicitly. Missing timings are a scheduling blind spot.
|
||||||
|
- For real leaks:
|
||||||
|
- Patch the implicated test or runtime cleanup path.
|
||||||
|
- Look for missing `afterEach`/`afterAll`, module-reset gaps, retained global state, unreleased DB handles, or listeners/timers that survive the file.
|
||||||
|
|
||||||
|
5. Verify with the most direct proof.
|
||||||
|
- Re-run the targeted lane or file with heap snapshots enabled if the suite still finishes in reasonable time.
|
||||||
|
- If snapshot overhead pushes tests over Vitest timeouts, fall back to the same lane without snapshots and confirm the RSS trend or OOM is reduced.
|
||||||
|
- For wrapper-only changes, at minimum verify the expected lanes start and the snapshot files are written.
|
||||||
|
|
||||||
|
## Heuristics
|
||||||
|
|
||||||
|
- Do not call everything a leak. In this repo, large `unit-fast` growth can be a worker-lifetime problem rather than an application object leak.
|
||||||
|
- `scripts/test-parallel.mjs` and `scripts/test-parallel-memory.mjs` are the primary control points for wrapper diagnostics.
|
||||||
|
- The lane names printed by `[test-parallel] start ...` and `[test-parallel][mem] summary ...` tell you where to focus.
|
||||||
|
- When one or two files account for most of the delta and they are missing from timings, reducing impact by isolating them is usually the first pragmatic fix.
|
||||||
|
- When the same retained object families grow across multiple intervals in the same worker PID, trust the snapshots over intuition.
|
||||||
|
|
||||||
|
## Snapshot Comparison
|
||||||
|
|
||||||
|
- Direct comparison:
|
||||||
|
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs before.heapsnapshot after.heapsnapshot`
|
||||||
|
- Auto-select earliest/latest snapshots per PID within one lane:
|
||||||
|
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs --lane-dir .tmp/heapsnap/unit-fast`
|
||||||
|
- Useful flags:
|
||||||
|
- `--top 40`
|
||||||
|
- `--min-kb 32`
|
||||||
|
- `--pid 16133`
|
||||||
|
|
||||||
|
Read the top positive deltas first. Large positive growth in module-transform artifacts suggests lane isolation; large positive growth in runtime objects suggests a real leak.
|
||||||
|
|
||||||
|
## Output Expectations
|
||||||
|
|
||||||
|
When using this skill, report:
|
||||||
|
|
||||||
|
- The exact reproduce command.
|
||||||
|
- Which lane and PID were compared.
|
||||||
|
- The dominant retained object families from the snapshot delta.
|
||||||
|
- Whether the issue is a real leak or shared-worker retained module growth.
|
||||||
|
- The concrete fix or impact-reduction patch.
|
||||||
|
- What you verified, and what snapshot overhead prevented you from verifying.
|
||||||
@ -0,0 +1,4 @@
|
|||||||
|
interface:
|
||||||
|
display_name: "Test Heap Leaks"
|
||||||
|
short_description: "Investigate test OOMs with heap snapshots"
|
||||||
|
default_prompt: "Use $openclaw-test-heap-leaks to investigate test memory growth with heap snapshots and reduce its impact."
|
||||||
@ -0,0 +1,265 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
|
||||||
|
import fs from "node:fs";
|
||||||
|
import path from "node:path";
|
||||||
|
|
||||||
|
function printUsage() {
|
||||||
|
console.error(
|
||||||
|
"Usage: node heapsnapshot-delta.mjs <before.heapsnapshot> <after.heapsnapshot> [--top N] [--min-kb N]",
|
||||||
|
);
|
||||||
|
console.error(
|
||||||
|
" or: node heapsnapshot-delta.mjs --lane-dir <dir> [--pid PID] [--top N] [--min-kb N]",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function fail(message) {
|
||||||
|
console.error(message);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseArgs(argv) {
|
||||||
|
const options = {
|
||||||
|
top: 30,
|
||||||
|
minKb: 64,
|
||||||
|
laneDir: null,
|
||||||
|
pid: null,
|
||||||
|
files: [],
|
||||||
|
};
|
||||||
|
|
||||||
|
for (let index = 0; index < argv.length; index += 1) {
|
||||||
|
const arg = argv[index];
|
||||||
|
if (arg === "--top") {
|
||||||
|
options.top = Number.parseInt(argv[index + 1] ?? "", 10);
|
||||||
|
index += 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (arg === "--min-kb") {
|
||||||
|
options.minKb = Number.parseInt(argv[index + 1] ?? "", 10);
|
||||||
|
index += 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (arg === "--lane-dir") {
|
||||||
|
options.laneDir = argv[index + 1] ?? null;
|
||||||
|
index += 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (arg === "--pid") {
|
||||||
|
options.pid = Number.parseInt(argv[index + 1] ?? "", 10);
|
||||||
|
index += 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
options.files.push(arg);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!Number.isFinite(options.top) || options.top <= 0) {
|
||||||
|
fail("--top must be a positive integer");
|
||||||
|
}
|
||||||
|
if (!Number.isFinite(options.minKb) || options.minKb < 0) {
|
||||||
|
fail("--min-kb must be a non-negative integer");
|
||||||
|
}
|
||||||
|
if (options.pid !== null && (!Number.isInteger(options.pid) || options.pid <= 0)) {
|
||||||
|
fail("--pid must be a positive integer");
|
||||||
|
}
|
||||||
|
|
||||||
|
return options;
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseHeapFilename(filePath) {
|
||||||
|
const base = path.basename(filePath);
|
||||||
|
const match = base.match(
|
||||||
|
/^Heap\.(?<stamp>\d{8}\.\d{6})\.(?<pid>\d+)\.0\.(?<seq>\d+)\.heapsnapshot$/u,
|
||||||
|
);
|
||||||
|
if (!match?.groups) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
filePath,
|
||||||
|
pid: Number.parseInt(match.groups.pid, 10),
|
||||||
|
stamp: match.groups.stamp,
|
||||||
|
sequence: Number.parseInt(match.groups.seq, 10),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function resolvePair(options) {
|
||||||
|
if (options.laneDir) {
|
||||||
|
const entries = fs
|
||||||
|
.readdirSync(options.laneDir)
|
||||||
|
.map((name) => parseHeapFilename(path.join(options.laneDir, name)))
|
||||||
|
.filter((entry) => entry !== null)
|
||||||
|
.filter((entry) => options.pid === null || entry.pid === options.pid)
|
||||||
|
.toSorted((left, right) => {
|
||||||
|
if (left.pid !== right.pid) {
|
||||||
|
return left.pid - right.pid;
|
||||||
|
}
|
||||||
|
if (left.stamp !== right.stamp) {
|
||||||
|
return left.stamp.localeCompare(right.stamp);
|
||||||
|
}
|
||||||
|
return left.sequence - right.sequence;
|
||||||
|
});
|
||||||
|
|
||||||
|
if (entries.length === 0) {
|
||||||
|
fail(`No matching heap snapshots found in ${options.laneDir}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const groups = new Map();
|
||||||
|
for (const entry of entries) {
|
||||||
|
const group = groups.get(entry.pid) ?? [];
|
||||||
|
group.push(entry);
|
||||||
|
groups.set(entry.pid, group);
|
||||||
|
}
|
||||||
|
|
||||||
|
const candidates = Array.from(groups.values())
|
||||||
|
.map((group) => ({
|
||||||
|
pid: group[0].pid,
|
||||||
|
before: group[0],
|
||||||
|
after: group.at(-1),
|
||||||
|
count: group.length,
|
||||||
|
}))
|
||||||
|
.filter((entry) => entry.count >= 2);
|
||||||
|
|
||||||
|
if (candidates.length === 0) {
|
||||||
|
fail(`Need at least two snapshots for one PID in ${options.laneDir}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const chosen =
|
||||||
|
options.pid !== null
|
||||||
|
? (candidates.find((entry) => entry.pid === options.pid) ?? null)
|
||||||
|
: candidates.toSorted((left, right) => right.count - left.count || left.pid - right.pid)[0];
|
||||||
|
|
||||||
|
if (!chosen) {
|
||||||
|
fail(`No PID with at least two snapshots matched in ${options.laneDir}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
before: chosen.before.filePath,
|
||||||
|
after: chosen.after.filePath,
|
||||||
|
pid: chosen.pid,
|
||||||
|
snapshotCount: chosen.count,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
if (options.files.length !== 2) {
|
||||||
|
printUsage();
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
before: options.files[0],
|
||||||
|
after: options.files[1],
|
||||||
|
pid: null,
|
||||||
|
snapshotCount: 2,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function loadSummary(filePath) {
|
||||||
|
const data = JSON.parse(fs.readFileSync(filePath, "utf8"));
|
||||||
|
const meta = data.snapshot?.meta;
|
||||||
|
if (!meta) {
|
||||||
|
fail(`Invalid heap snapshot: ${filePath}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const nodeFieldCount = meta.node_fields.length;
|
||||||
|
const typeNames = meta.node_types[0];
|
||||||
|
const strings = data.strings;
|
||||||
|
const typeIndex = meta.node_fields.indexOf("type");
|
||||||
|
const nameIndex = meta.node_fields.indexOf("name");
|
||||||
|
const selfSizeIndex = meta.node_fields.indexOf("self_size");
|
||||||
|
|
||||||
|
const summary = new Map();
|
||||||
|
for (let offset = 0; offset < data.nodes.length; offset += nodeFieldCount) {
|
||||||
|
const type = typeNames[data.nodes[offset + typeIndex]];
|
||||||
|
const name = strings[data.nodes[offset + nameIndex]];
|
||||||
|
const selfSize = data.nodes[offset + selfSizeIndex];
|
||||||
|
const key = `${type}\t${name}`;
|
||||||
|
const current = summary.get(key) ?? {
|
||||||
|
type,
|
||||||
|
name,
|
||||||
|
selfSize: 0,
|
||||||
|
count: 0,
|
||||||
|
};
|
||||||
|
current.selfSize += selfSize;
|
||||||
|
current.count += 1;
|
||||||
|
summary.set(key, current);
|
||||||
|
}
|
||||||
|
return {
|
||||||
|
nodeCount: data.snapshot.node_count,
|
||||||
|
summary,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatBytes(bytes) {
|
||||||
|
if (Math.abs(bytes) >= 1024 ** 2) {
|
||||||
|
return `${(bytes / 1024 ** 2).toFixed(2)} MiB`;
|
||||||
|
}
|
||||||
|
if (Math.abs(bytes) >= 1024) {
|
||||||
|
return `${(bytes / 1024).toFixed(1)} KiB`;
|
||||||
|
}
|
||||||
|
return `${bytes} B`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatDelta(bytes) {
|
||||||
|
return `${bytes >= 0 ? "+" : "-"}${formatBytes(Math.abs(bytes))}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function truncate(text, maxLength) {
|
||||||
|
return text.length <= maxLength ? text : `${text.slice(0, maxLength - 1)}…`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function main() {
|
||||||
|
const options = parseArgs(process.argv.slice(2));
|
||||||
|
const pair = resolvePair(options);
|
||||||
|
const before = loadSummary(pair.before);
|
||||||
|
const after = loadSummary(pair.after);
|
||||||
|
const minBytes = options.minKb * 1024;
|
||||||
|
|
||||||
|
const rows = [];
|
||||||
|
for (const [key, next] of after.summary) {
|
||||||
|
const previous = before.summary.get(key) ?? { selfSize: 0, count: 0 };
|
||||||
|
const sizeDelta = next.selfSize - previous.selfSize;
|
||||||
|
const countDelta = next.count - previous.count;
|
||||||
|
if (sizeDelta < minBytes) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
rows.push({
|
||||||
|
type: next.type,
|
||||||
|
name: next.name,
|
||||||
|
sizeDelta,
|
||||||
|
countDelta,
|
||||||
|
afterSize: next.selfSize,
|
||||||
|
afterCount: next.count,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
rows.sort(
|
||||||
|
(left, right) => right.sizeDelta - left.sizeDelta || right.countDelta - left.countDelta,
|
||||||
|
);
|
||||||
|
|
||||||
|
console.log(`before: ${pair.before}`);
|
||||||
|
console.log(`after: ${pair.after}`);
|
||||||
|
if (pair.pid !== null) {
|
||||||
|
console.log(`pid: ${pair.pid} (${pair.snapshotCount} snapshots found)`);
|
||||||
|
}
|
||||||
|
console.log(
|
||||||
|
`nodes: ${before.nodeCount} -> ${after.nodeCount} (${after.nodeCount - before.nodeCount >= 0 ? "+" : ""}${after.nodeCount - before.nodeCount})`,
|
||||||
|
);
|
||||||
|
console.log(`filter: top=${options.top} min=${options.minKb} KiB`);
|
||||||
|
console.log("");
|
||||||
|
|
||||||
|
if (rows.length === 0) {
|
||||||
|
console.log("No entries exceeded the minimum delta.");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const row of rows.slice(0, options.top)) {
|
||||||
|
console.log(
|
||||||
|
[
|
||||||
|
formatDelta(row.sizeDelta).padStart(11),
|
||||||
|
`count ${row.countDelta >= 0 ? "+" : ""}${row.countDelta}`.padStart(10),
|
||||||
|
row.type.padEnd(16),
|
||||||
|
truncate(row.name || "(empty)", 96),
|
||||||
|
].join(" "),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
main();
|
||||||
@ -11,7 +11,7 @@ const ANSI_ESCAPE_PATTERN = new RegExp(
|
|||||||
const COMPLETED_TEST_FILE_LINE_PATTERN =
|
const COMPLETED_TEST_FILE_LINE_PATTERN =
|
||||||
/(?<file>(?:src|extensions|test|ui)\/\S+?\.(?:live\.test|e2e\.test|test)\.ts)\s+\(.*\)\s+(?<duration>\d+(?:\.\d+)?)(?<unit>ms|s)\s*$/;
|
/(?<file>(?:src|extensions|test|ui)\/\S+?\.(?:live\.test|e2e\.test|test)\.ts)\s+\(.*\)\s+(?<duration>\d+(?:\.\d+)?)(?<unit>ms|s)\s*$/;
|
||||||
|
|
||||||
const PS_COLUMNS = ["pid=", "ppid=", "rss="];
|
const PS_COLUMNS = ["pid=", "ppid=", "rss=", "comm="];
|
||||||
|
|
||||||
function parseDurationMs(rawValue, unit) {
|
function parseDurationMs(rawValue, unit) {
|
||||||
const parsed = Number.parseFloat(rawValue);
|
const parsed = Number.parseFloat(rawValue);
|
||||||
@ -41,7 +41,7 @@ export function parseCompletedTestFileLines(text) {
|
|||||||
.filter((entry) => entry !== null);
|
.filter((entry) => entry !== null);
|
||||||
}
|
}
|
||||||
|
|
||||||
export function sampleProcessTreeRssKb(rootPid) {
|
export function getProcessTreeRecords(rootPid) {
|
||||||
if (!Number.isInteger(rootPid) || rootPid <= 0 || process.platform === "win32") {
|
if (!Number.isInteger(rootPid) || rootPid <= 0 || process.platform === "win32") {
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
@ -54,13 +54,13 @@ export function sampleProcessTreeRssKb(rootPid) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
const childPidsByParent = new Map();
|
const childPidsByParent = new Map();
|
||||||
const rssByPid = new Map();
|
const recordsByPid = new Map();
|
||||||
for (const line of result.stdout.split(/\r?\n/u)) {
|
for (const line of result.stdout.split(/\r?\n/u)) {
|
||||||
const trimmed = line.trim();
|
const trimmed = line.trim();
|
||||||
if (!trimmed) {
|
if (!trimmed) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
const [pidRaw, parentRaw, rssRaw] = trimmed.split(/\s+/u);
|
const [pidRaw, parentRaw, rssRaw, commandRaw] = trimmed.split(/\s+/u, 4);
|
||||||
const pid = Number.parseInt(pidRaw ?? "", 10);
|
const pid = Number.parseInt(pidRaw ?? "", 10);
|
||||||
const parentPid = Number.parseInt(parentRaw ?? "", 10);
|
const parentPid = Number.parseInt(parentRaw ?? "", 10);
|
||||||
const rssKb = Number.parseInt(rssRaw ?? "", 10);
|
const rssKb = Number.parseInt(rssRaw ?? "", 10);
|
||||||
@ -70,27 +70,30 @@ export function sampleProcessTreeRssKb(rootPid) {
|
|||||||
const siblings = childPidsByParent.get(parentPid) ?? [];
|
const siblings = childPidsByParent.get(parentPid) ?? [];
|
||||||
siblings.push(pid);
|
siblings.push(pid);
|
||||||
childPidsByParent.set(parentPid, siblings);
|
childPidsByParent.set(parentPid, siblings);
|
||||||
rssByPid.set(pid, rssKb);
|
recordsByPid.set(pid, {
|
||||||
|
pid,
|
||||||
|
parentPid,
|
||||||
|
rssKb,
|
||||||
|
command: commandRaw ?? "",
|
||||||
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!rssByPid.has(rootPid)) {
|
if (!recordsByPid.has(rootPid)) {
|
||||||
return null;
|
return null;
|
||||||
}
|
}
|
||||||
|
|
||||||
let rssKb = 0;
|
|
||||||
let processCount = 0;
|
|
||||||
const queue = [rootPid];
|
const queue = [rootPid];
|
||||||
const visited = new Set();
|
const visited = new Set();
|
||||||
|
const records = [];
|
||||||
while (queue.length > 0) {
|
while (queue.length > 0) {
|
||||||
const pid = queue.shift();
|
const pid = queue.shift();
|
||||||
if (pid === undefined || visited.has(pid)) {
|
if (pid === undefined || visited.has(pid)) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
visited.add(pid);
|
visited.add(pid);
|
||||||
const currentRssKb = rssByPid.get(pid);
|
const record = recordsByPid.get(pid);
|
||||||
if (currentRssKb !== undefined) {
|
if (record) {
|
||||||
rssKb += currentRssKb;
|
records.push(record);
|
||||||
processCount += 1;
|
|
||||||
}
|
}
|
||||||
for (const childPid of childPidsByParent.get(pid) ?? []) {
|
for (const childPid of childPidsByParent.get(pid) ?? []) {
|
||||||
if (!visited.has(childPid)) {
|
if (!visited.has(childPid)) {
|
||||||
@ -99,5 +102,21 @@ export function sampleProcessTreeRssKb(rootPid) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
return records;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function sampleProcessTreeRssKb(rootPid) {
|
||||||
|
const records = getProcessTreeRecords(rootPid);
|
||||||
|
if (!records) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
let rssKb = 0;
|
||||||
|
let processCount = 0;
|
||||||
|
for (const record of records) {
|
||||||
|
rssKb += record.rssKb;
|
||||||
|
processCount += 1;
|
||||||
|
}
|
||||||
|
|
||||||
return { rssKb, processCount };
|
return { rssKb, processCount };
|
||||||
}
|
}
|
||||||
|
|||||||
@ -4,7 +4,11 @@ import os from "node:os";
|
|||||||
import path from "node:path";
|
import path from "node:path";
|
||||||
import { channelTestPrefixes } from "../vitest.channel-paths.mjs";
|
import { channelTestPrefixes } from "../vitest.channel-paths.mjs";
|
||||||
import { isUnitConfigTestFile } from "../vitest.unit-paths.mjs";
|
import { isUnitConfigTestFile } from "../vitest.unit-paths.mjs";
|
||||||
import { parseCompletedTestFileLines, sampleProcessTreeRssKb } from "./test-parallel-memory.mjs";
|
import {
|
||||||
|
getProcessTreeRecords,
|
||||||
|
parseCompletedTestFileLines,
|
||||||
|
sampleProcessTreeRssKb,
|
||||||
|
} from "./test-parallel-memory.mjs";
|
||||||
import {
|
import {
|
||||||
appendCapturedOutput,
|
appendCapturedOutput,
|
||||||
hasFatalTestRunOutput,
|
hasFatalTestRunOutput,
|
||||||
@ -725,6 +729,25 @@ const memoryTraceEnabled =
|
|||||||
(rawMemoryTrace !== "0" && rawMemoryTrace !== "false" && isCI));
|
(rawMemoryTrace !== "0" && rawMemoryTrace !== "false" && isCI));
|
||||||
const memoryTracePollMs = Math.max(250, parseEnvNumber("OPENCLAW_TEST_MEMORY_TRACE_POLL_MS", 1000));
|
const memoryTracePollMs = Math.max(250, parseEnvNumber("OPENCLAW_TEST_MEMORY_TRACE_POLL_MS", 1000));
|
||||||
const memoryTraceTopCount = Math.max(1, parseEnvNumber("OPENCLAW_TEST_MEMORY_TRACE_TOP_COUNT", 6));
|
const memoryTraceTopCount = Math.max(1, parseEnvNumber("OPENCLAW_TEST_MEMORY_TRACE_TOP_COUNT", 6));
|
||||||
|
const heapSnapshotIntervalMs = Math.max(
|
||||||
|
0,
|
||||||
|
parseEnvNumber("OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS", 0),
|
||||||
|
);
|
||||||
|
const heapSnapshotMinIntervalMs = 5000;
|
||||||
|
const heapSnapshotEnabled =
|
||||||
|
process.platform !== "win32" &&
|
||||||
|
heapSnapshotIntervalMs >= heapSnapshotMinIntervalMs;
|
||||||
|
const heapSnapshotEnabled = process.platform !== "win32" && heapSnapshotIntervalMs > 0;
|
||||||
|
const heapSnapshotSignal = process.env.OPENCLAW_TEST_HEAPSNAPSHOT_SIGNAL?.trim() || "SIGUSR2";
|
||||||
|
const heapSnapshotBaseDir = heapSnapshotEnabled
|
||||||
|
? path.resolve(
|
||||||
|
process.env.OPENCLAW_TEST_HEAPSNAPSHOT_DIR?.trim() ||
|
||||||
|
path.join(os.tmpdir(), `openclaw-heapsnapshots-${Date.now()}`),
|
||||||
|
)
|
||||||
|
: null;
|
||||||
|
const ensureNodeOptionFlag = (nodeOptions, flagPrefix, nextValue) =>
|
||||||
|
nodeOptions.includes(flagPrefix) ? nodeOptions : `${nodeOptions} ${nextValue}`.trim();
|
||||||
|
const isNodeLikeProcess = (command) => /(?:^|\/)node(?:$|\.exe$)/iu.test(command);
|
||||||
|
|
||||||
const runOnce = (entry, extraArgs = []) =>
|
const runOnce = (entry, extraArgs = []) =>
|
||||||
new Promise((resolve) => {
|
new Promise((resolve) => {
|
||||||
@ -757,23 +780,44 @@ const runOnce = (entry, extraArgs = []) =>
|
|||||||
(acc, flag) => (acc.includes(flag) ? acc : `${acc} ${flag}`.trim()),
|
(acc, flag) => (acc.includes(flag) ? acc : `${acc} ${flag}`.trim()),
|
||||||
nodeOptions,
|
nodeOptions,
|
||||||
);
|
);
|
||||||
const heapFlag =
|
const heapSnapshotDir =
|
||||||
|
heapSnapshotBaseDir === null ? null : path.join(heapSnapshotBaseDir, entry.name);
|
||||||
|
let resolvedNodeOptions =
|
||||||
maxOldSpaceSizeMb && !nextNodeOptions.includes("--max-old-space-size=")
|
maxOldSpaceSizeMb && !nextNodeOptions.includes("--max-old-space-size=")
|
||||||
? `--max-old-space-size=${maxOldSpaceSizeMb}`
|
? `${nextNodeOptions} --max-old-space-size=${maxOldSpaceSizeMb}`.trim()
|
||||||
: null;
|
|
||||||
const resolvedNodeOptions = heapFlag
|
|
||||||
? `${nextNodeOptions} ${heapFlag}`.trim()
|
|
||||||
: nextNodeOptions;
|
: nextNodeOptions;
|
||||||
|
if (heapSnapshotEnabled && heapSnapshotDir) {
|
||||||
|
try {
|
||||||
|
fs.mkdirSync(heapSnapshotDir, { recursive: true });
|
||||||
|
} catch (err) {
|
||||||
|
console.error(`[test-parallel] failed to create heap snapshot dir ${heapSnapshotDir}: ${String(err)}`);
|
||||||
|
resolve(1);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
resolvedNodeOptions = ensureNodeOptionFlag(
|
||||||
|
resolvedNodeOptions,
|
||||||
|
"--diagnostic-dir=",
|
||||||
|
`--diagnostic-dir=${heapSnapshotDir}`,
|
||||||
|
);
|
||||||
|
resolvedNodeOptions = ensureNodeOptionFlag(
|
||||||
|
resolvedNodeOptions,
|
||||||
|
"--heapsnapshot-signal=",
|
||||||
|
`--heapsnapshot-signal=${heapSnapshotSignal}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
let output = "";
|
let output = "";
|
||||||
let fatalSeen = false;
|
let fatalSeen = false;
|
||||||
let childError = null;
|
let childError = null;
|
||||||
let child;
|
let child;
|
||||||
let pendingLine = "";
|
let pendingLine = "";
|
||||||
let memoryPollTimer = null;
|
let memoryPollTimer = null;
|
||||||
|
let heapSnapshotTimer = null;
|
||||||
const memoryFileRecords = [];
|
const memoryFileRecords = [];
|
||||||
let initialTreeSample = null;
|
let initialTreeSample = null;
|
||||||
let latestTreeSample = null;
|
let latestTreeSample = null;
|
||||||
let peakTreeSample = null;
|
let peakTreeSample = null;
|
||||||
|
let heapSnapshotSequence = 0;
|
||||||
const updatePeakTreeSample = (sample, reason) => {
|
const updatePeakTreeSample = (sample, reason) => {
|
||||||
if (!sample) {
|
if (!sample) {
|
||||||
return;
|
return;
|
||||||
@ -782,6 +826,35 @@ const runOnce = (entry, extraArgs = []) =>
|
|||||||
peakTreeSample = { ...sample, reason };
|
peakTreeSample = { ...sample, reason };
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
const triggerHeapSnapshot = (reason) => {
|
||||||
|
if (!heapSnapshotEnabled || !child?.pid || !heapSnapshotDir) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const records = getProcessTreeRecords(child.pid) ?? [];
|
||||||
|
const targetPids = records
|
||||||
|
.filter((record) => record.pid !== process.pid && isNodeLikeProcess(record.command))
|
||||||
|
.map((record) => record.pid);
|
||||||
|
if (targetPids.length === 0) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
heapSnapshotSequence += 1;
|
||||||
|
let signaledCount = 0;
|
||||||
|
for (const pid of targetPids) {
|
||||||
|
try {
|
||||||
|
process.kill(pid, heapSnapshotSignal);
|
||||||
|
signaledCount += 1;
|
||||||
|
} catch {
|
||||||
|
// Process likely exited between ps sampling and signal delivery.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (signaledCount > 0) {
|
||||||
|
console.log(
|
||||||
|
`[test-parallel][heap] ${entry.name} seq=${String(heapSnapshotSequence)} reason=${reason} signaled=${String(
|
||||||
|
signaledCount,
|
||||||
|
)}/${String(targetPids.length)} dir=${heapSnapshotDir}`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
};
|
||||||
const captureTreeSample = (reason) => {
|
const captureTreeSample = (reason) => {
|
||||||
if (!memoryTraceEnabled || !child?.pid) {
|
if (!memoryTraceEnabled || !child?.pid) {
|
||||||
return null;
|
return null;
|
||||||
@ -877,6 +950,11 @@ const runOnce = (entry, extraArgs = []) =>
|
|||||||
captureTreeSample("poll");
|
captureTreeSample("poll");
|
||||||
}, memoryTracePollMs);
|
}, memoryTracePollMs);
|
||||||
}
|
}
|
||||||
|
if (heapSnapshotEnabled) {
|
||||||
|
heapSnapshotTimer = setInterval(() => {
|
||||||
|
triggerHeapSnapshot("interval");
|
||||||
|
}, heapSnapshotIntervalMs);
|
||||||
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error(`[test-parallel] spawn failed: ${String(err)}`);
|
console.error(`[test-parallel] spawn failed: ${String(err)}`);
|
||||||
resolve(1);
|
resolve(1);
|
||||||
@ -905,6 +983,9 @@ const runOnce = (entry, extraArgs = []) =>
|
|||||||
if (memoryPollTimer) {
|
if (memoryPollTimer) {
|
||||||
clearInterval(memoryPollTimer);
|
clearInterval(memoryPollTimer);
|
||||||
}
|
}
|
||||||
|
if (heapSnapshotTimer) {
|
||||||
|
clearInterval(heapSnapshotTimer);
|
||||||
|
}
|
||||||
children.delete(child);
|
children.delete(child);
|
||||||
const resolvedCode = resolveTestRunExitCode({ code, signal, output, fatalSeen, childError });
|
const resolvedCode = resolveTestRunExitCode({ code, signal, output, fatalSeen, childError });
|
||||||
logMemoryTraceSummary();
|
logMemoryTraceSummary();
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user