36 Commits

Author SHA1 Message Date
rabsef-bicrym
ff47876e61
fix: carry observed overflow token counts into compaction (#40357)
Merged via squash.

Prepared head SHA: b99eed4329bda45083cdedc2386c2c4041c034be
Co-authored-by: rabsef-bicrym <52549148+rabsef-bicrym@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-03-12 06:58:42 -07:00
jnMetaCode
f640326e31
fix(failover): add missing network errno patterns to text-based timeout classifier (#42830)
Merged via squash.

Prepared head SHA: 91761487e8825c0fd6582a762d04bba04f726a85
Co-authored-by: jnMetaCode <12096460+jnMetaCode@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-12 12:34:44 +03:00
Squabble9
128e5bc317
fix: recognize Venice 402 billing errors for model fallback (#43205)
Merged via squash.

Prepared head SHA: 1f6b10b9d934235e71f279f888292139c4a85aa6
Co-authored-by: Squabble9 <194720422+Squabble9@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-11 22:15:32 +03:00
ademczuk
58634c9c65
fix(agents): check billing errors before context overflow heuristics (#40409)
Merged via squash.

Prepared head SHA: c88f89c462d87957a4c6c51a23ab997fd307059d
Co-authored-by: ademczuk <5212682+ademczuk@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-11 21:08:55 +03:00
JiangNan
e9e8b81939
fix(failover): classify Gemini MALFORMED_RESPONSE as retryable timeout (#42292)
Merged via squash.

Prepared head SHA: 68f106ff49fc7a28a806601bc8ca1e5e77c6e8c6
Co-authored-by: jnMetaCode <12096460+jnMetaCode@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-10 20:34:32 +03:00
CryUshio
8bf64f219a
fix: recognize Poe 402 'used up your points' as billing for fallback (#42278)
Merged via squash.

Prepared head SHA: f3cdfa76dd9afcb023504eef723036e826e6ebc5
Co-authored-by: CryUshio <30655354+CryUshio@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-10 20:17:36 +03:00
Ayaan Zaidi
9432a8bb3f test: allowlist detect-secrets fixture strings 2026-03-10 08:14:35 +05:30
alan blount
c9a6c542ef
Add HTTP 499 to transient error codes for model fallback (#41468)
Merged via squash.

Prepared head SHA: 0053bae14038e6df9264df364d1c9aa83d5b698e
Co-authored-by: zeroasterisk <23422+zeroasterisk@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-10 01:55:10 +03:00
gambletan
8a20f51460
fix: add rate limit patterns for 'too many tokens' and 'tokens per day' (#39377)
Merged via squash.

Prepared head SHA: 132a45728694053c0e3220e7d861508524f17244
Co-authored-by: gambletan <266203672+gambletan@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-08 13:03:33 +03:00
Peter Lee
92648f9ba9
fix(agents): broaden 402 temporary-limit detection and allow billing cooldown probe (#38533)
Merged via squash.

Prepared head SHA: 282b9186c6f48fcdbf0c81c49f739e5e9ed2df23
Co-authored-by: xialonglee <22994703+xialonglee@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-08 10:27:01 +03:00
Altay
6e962d8b9e
fix(agents): handle overloaded failover separately (#38301)
* fix(agents): skip auth-profile failure on overload

* fix(agents): note overload auth-profile fallback fix

* fix(agents): classify overloaded failures separately

* fix(agents): back off before overload failover

* fix(agents): tighten overload probe and backoff state

* fix(agents): persist overloaded cooldown across runs

* fix(agents): tighten overloaded status handling

* test(agents): add overload regression coverage

* fix(agents): restore runner imports after rebase

* test(agents): add overload fallback integration coverage

* fix(agents): harden overloaded failover abort handling

* test(agents): tighten overload classifier coverage

* test(agents): cover all-overloaded fallback exhaustion

* fix(cron): retry overloaded fallback summaries

* fix(cron): treat HTTP 529 as overloaded retry
2026-03-07 01:42:11 +03:00
zhouhe-xydt
a65d70f84b
Fix failover for zhipuai 1310 Weekly/Monthly Limit Exhausted (#33813)
Merged via squash.

Prepared head SHA: 3dc441e58de48913720cf7b6137fa761758d8344
Co-authored-by: zhouhe-xydt <265407618+zhouhe-xydt@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-06 12:04:09 +03:00
Altay
49acb07f9f
fix(agents): classify insufficient_quota 400s as billing (#36783) 2026-03-06 01:17:48 +03:00
Altay
6859619e98
test(agents): add provider-backed failover regressions (#36735)
* test(agents): add provider-backed failover fixtures

* test(agents): cover more provider error docs

* test(agents): tighten provider doc fixtures
2026-03-06 00:42:59 +03:00
jiangnan
029c473727
fix(failover): narrow service-unavailable to require overload indicator (#32828) (#36646)
Merged via squash.

Prepared head SHA: 46fb4306127972d7635f371fd9029fbb9baff236
Co-authored-by: jnMetaCode <12096460+jnMetaCode@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-06 00:01:57 +03:00
不做了睡大觉
8ac7ce73b3
fix: avoid false global rate-limit classification from generic cooldown text (#32972)
Merged via squash.

Prepared head SHA: 813c16f5afce415da130a917d9ce9f968912b477
Co-authored-by: stakeswky <64798754+stakeswky@users.noreply.github.com>
Co-authored-by: altaywtf <9790196+altaywtf@users.noreply.github.com>
Reviewed-by: @altaywtf
2026-03-05 22:58:21 +03:00
Kai
60a6d11116
fix(embedded): classify model_context_window_exceeded as context overflow, trigger compaction (#35934)
Merged via squash.

Prepared head SHA: 20fa77289c80b2807a6779a3df70440242bc18ca
Co-authored-by: RealKai42 <44634134+RealKai42@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
2026-03-05 11:30:24 -08:00
AI南柯(KingMo)
30ab9b2068
fix(agents): recognize connection errors as retryable timeout failures (#31697)
* fix(agents): recognize connection errors as retryable timeout failures

## Problem

When a model endpoint becomes unreachable (e.g., local proxy down,
relay server offline), the failover system fails to switch to the
next candidate model. Errors like "Connection error." are not
classified as retryable, causing the session to hang on a broken
endpoint instead of falling back to healthy alternatives.

## Root Cause

Connection/network errors are not recognized by the current failover
classifier:
- Text patterns like "Connection error.", "fetch failed", "network error"
- Error codes like ECONNREFUSED, ENOTFOUND, EAI_AGAIN (in message text)

While `failover-error.ts` handles these as error codes (err.code),
it misses them when they appear as plain text in error messages.

## Solution

Extend timeout error patterns to include connection/network failures:

**In `errors.ts` (ERROR_PATTERNS.timeout):**
- Text: "connection error", "network error", "fetch failed", etc.
- Regex: /\beconn(?:refused|reset|aborted)\b/i, /\benotfound\b/i, /\beai_again\b/i

**In `failover-error.ts` (TIMEOUT_HINT_RE):**
- Same patterns for non-assistant error paths

## Testing

Added test cases covering:
- "Connection error."
- "fetch failed"
- "network error: ECONNREFUSED"
- "ENOTFOUND" / "EAI_AGAIN" in message text

## Impact

- **Compatibility:** High - only expands retryable error detection
- **Behavior:** Connection failures now trigger automatic fallback
- **Risk:** Low - changes are additive and well-tested

* style: fix code formatting for test file
2026-03-03 02:37:23 +00:00
Peter Steinberger
1bd20dbdb6 fix(failover): treat stop reason error as timeout 2026-03-03 01:05:24 +00:00
Peter Steinberger
a2fdc3415f fix(failover): handle unhandled stop reason error 2026-03-03 01:05:24 +00:00
Peter Steinberger
250f9e15f5 fix(agents): land #31007 from @HOYALIM
Co-authored-by: Ho Lim <subhoya@gmail.com>
2026-03-02 01:06:00 +00:00
Aleksandrs Tihenko
c0026274d9
fix(auth): distinguish revoked API keys from transient auth errors (#25754)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 8f9c07a200644284e11adae76368adab40c5fa4e
Co-authored-by: rrenamed <87486610+rrenamed@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-25 19:47:16 -05:00
Peter Steinberger
d2597d5ecf fix(agents): harden model fallback failover paths 2026-02-25 03:46:34 +00:00
Peter Steinberger
43f318cd9a fix(agents): reduce billing false positives on long text (#25680)
Land PR #25680 from @lairtonlelis.
Retain explicit status/code/http 402 detection for oversized structured payloads.

Co-authored-by: Ailton <lairton@telnyx.com>
2026-02-25 01:22:17 +00:00
Peter Machona
9ced64054f
fix(auth): classify missing OAuth scopes as auth failures (#24761) 2026-02-24 03:33:44 +00:00
Clawborn
544809b6f6
Add Chinese context overflow patterns to isContextOverflowError (#22855)
Proxy providers returning Chinese error messages (e.g. Chinese LLM
gateways) use patterns like '上下文过长' or '上下文超出' that are not
matched by the existing English-only patterns in isContextOverflowError.
This prevents auto-compaction from triggering, leaving the session stuck.

Add the most common Chinese proxy patterns:
- 上下文过长 (context too long)
- 上下文超出 (context exceeded)
- 上下文长度超 (context length exceeds)
- 超出最大上下文 (exceeds maximum context)
- 请压缩上下文 (please compress context)

Chinese characters are unaffected by toLowerCase() so check the
original message directly.

Closes #22849
2026-02-23 10:54:24 -05:00
Vincent Koc
4f340b8812
fix(agents): avoid classifying reasoning-required errors as context overflow (#24593)
* Agents: exclude reasoning-required errors from overflow detection

* Tests: cover reasoning-required overflow classification guard

* Tests: format reasoning-required endpoint errors
2026-02-23 10:38:49 -05:00
青雲
69692d0d3a
fix: detect additional context overflow error patterns to prevent leak to user (#20539)
* fix: detect additional context overflow error patterns to prevent leak to user

Fixes #9951

The error 'input length and max_tokens exceed context limit: 170636 +
34048 > 200000' was not caught by isContextOverflowError() and leaked
to users via formatAssistantErrorText()'s invalidRequest fallback.

Add three new patterns to isContextOverflowError():
- 'exceed context limit' (direct match)
- 'exceeds the model\'s maximum context'
- max_tokens/input length + exceed + context (compound match)

These are now rewritten to the friendly context overflow message.

* Overflow: add regression tests and changelog credits

* Update CHANGELOG.md

* Update pi-embedded-helpers.isbillingerrormessage.test.ts

---------

Co-authored-by: echoVic <AkiraVic@outlook.com>
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-02-23 10:03:56 -05:00
Peter Steinberger
9bd04849ed fix(agents): detect Kimi model-token-limit overflows
Co-authored-by: Danilo Falcão <danilo@falcao.org>
2026-02-23 12:44:23 +00:00
taw0002
3c57bf4c85
fix: treat HTTP 502/503/504 as failover-eligible (timeout reason) (#21017)
* fix: treat HTTP 502/503/504 as failover-eligible (timeout reason)

When a model API returns 502 Bad Gateway, 503 Service Unavailable, or
504 Gateway Timeout, the error object carries the status code directly.
resolveFailoverReasonFromError() only checked 402/429/401/403/408/400,
so 5xx server errors fell through to message-based classification which
requires the status code to appear at the start of the error message.

Many API SDKs (Google, Anthropic) set err.status = 503 without prefixing
the message with '503', so the message classifier never matched and
failover never triggered — the run retried the same broken model.

Add 502/503/504 to the status-code branch, returning 'timeout' (matching
the existing behavior of isTransientHttpError in the message classifier).

Fixes #20999

* Changelog: add failover 502/503/504 note with credits

* Failover: classify HTTP 504 as transient in message parser

* Changelog: credit taw0002 and vincentkoc for failover fix

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-02-23 03:01:57 -05:00
Peter Steinberger
4267fc8593 test: reclassify pi embedded helper suites out of agents e2e 2026-02-22 10:53:50 +00:00
Peter Steinberger
9131b22a28 test: migrate suites to e2e coverage layout 2026-02-13 14:28:22 +00:00
0xRain
4f329f923c
fix(agents): narrow billing error 402 regex to avoid false positives on issue IDs (#13827)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: b0501bbab7b3ec3ed56eb8903d9a27f8273f0edb
Co-authored-by: 0xRaini <190923101+0xRaini@users.noreply.github.com>
Co-authored-by: sebslight <19554889+sebslight@users.noreply.github.com>
Reviewed-by: @sebslight
2026-02-12 09:18:06 -05:00
quotentiroler
039aaf176e CI: cleanup and fix broken job references
- Fix code-size -> code-analysis job name (5 jobs had wrong dependency)
- Remove useless install-check job (was no-op)
- Add explicit docs_only guard to release-check
- Remove dead submodule checkout steps (no submodules in repo)
- Rename detect-docs-only -> detect-docs-changes, add docs_changed output
- Reorder check script: format first for faster fail
- Fix billing error test (PR #12946 removed fallback detection but not test)
2026-02-09 17:52:51 -08:00
Peter Steinberger
c379191f80 chore: migrate to oxlint and oxfmt
Co-authored-by: Christoph Nakazawa <christoph.pojer@gmail.com>
2026-01-14 15:02:19 +00:00
Peter Steinberger
bcbfb357be refactor(src): split oversized modules 2026-01-14 01:17:56 +00:00