fix: don't consume replyToMode=first slot for compaction notices

Compaction start/end notices are transient status messages that should
be threaded (appear in-context) but must not advance the hasThreaded
flag inside createReplyToModeFilter when mode=first.

Before this fix, the compaction start notice was the "first" threaded
message, so all real assistant reply chunks that followed had replyToId
stripped and were sent as unthreaded top-level messages.

Fix: skip advancing hasThreaded when payload.isCompactionNotice is true.
The notice still receives replyToId (so it appears in the thread), but
the filter's stateful "first" slot is preserved for the actual assistant
reply that follows.
This commit is contained in:
zidongdesign 2026-03-08 14:36:27 +08:00 committed by Josh Lehman
parent e7fd0a7b21
commit 1e381c6c8c
No known key found for this signature in database
GPG Key ID: D141B425AC7F876B

View File

@ -44,7 +44,13 @@ export function createReplyToModeFilter(
if (hasThreaded) {
return { ...payload, replyToId: undefined };
}
hasThreaded = true;
// Compaction notices are transient status messages — they should be
// threaded (so they appear in-context), but they must not consume the
// "first" slot of the replyToMode=first filter. Skip advancing
// hasThreaded so the real assistant reply still gets replyToId.
if (!payload.isCompactionNotice) {
hasThreaded = true;
}
return payload;
};
}