Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions source/client-backpressure/client-backpressure.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,16 @@ rules:
- To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and
[retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See
[Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand)
3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff
according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
- `jitter` is a random jitter value between 0 and 1.
- `BASE_BACKOFF` is constant 100ms.
- `MAX_BACKOFF` is 10000ms.
- This results in delays of 100ms and 200ms before accounting for jitter.
3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply backoff according to the
following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
Comment thread
jyemin marked this conversation as resolved.
1. `jitter` is a random jitter value between 0 and 1.
2. `MAX_BACKOFF` is 10000ms.
3. `BASE_BACKOFF` is constant 100ms.
4. `attempt` is the retry number. The first retry is `attempt = 1`, the second is `attempt = 2`, and so on.
5. This results in delays of 100ms and 200ms before accounting for jitter.
6. If `retryAfterMS` is present on the error and has a positive value, the client MUST use that value instead of
`BASE_BACKOFF`. `retryAfterMS` represents a server-supplied base backoff to use in place of the driver's
default.
4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the
client MUST add the previously used server's address to the list of deprioritized server addresses for
[server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a
Expand Down Expand Up @@ -214,8 +218,11 @@ def execute_command_retryable(command, ...):

if is_overload:
jitter = random.random() # Random float between [0.0, 1.0).
backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2 ** (attempt - 1))

retry_after = BASE_BACKOFF
# If present on the error, retryAfterMS overrides the base backoff
if exc.retry_after_ms:
retry_after = exc.retry_after_ms / 1000 # Convert from milliseconds to seconds
backoff = jitter * min(MAX_BACKOFF, retry_after * 2 ** (attempt - 1))
# If the delay exceeds the deadline, bail early.
if _csot.get_timeout():
if time.monotonic() + backoff > _csot.get_deadline():
Expand Down Expand Up @@ -433,6 +440,8 @@ to understand and configure.

## Changelog

- 2026-06-16: Add support for retryAfterMS backoff calculation.

- 2026-04-14: Clarify correct retry behavior when a mix of overload and non-overload errors are encountered.

- 2026-03-30: Introduce phase 1 support without token buckets.
Expand Down
59 changes: 59 additions & 0 deletions source/client-backpressure/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,62 @@ option.
5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.

6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2).

#### Test 5: Overload Errors with retryAfterMS override base backoff

Drivers SHOULD test that overload errors with `retryAfterMS` override the default backoff duration. This test MUST be
executed against a MongoDB 9.0+ server that has enabled the `configureFailPoint` command with the `errorLabels` option.

1. Let `client` be a `MongoClient`.

2. Let `coll` be a collection.

3. Configure the random number generator used for exponential backoff jitter to always return a number as close as
possible to `1`.

4. Configure the following failPoint:

```javascript
{
configureFailPoint: 'failCommand',
mode: 'alwaysOn',
data: {
failCommands: ['insert'],
errorCode: 462,
errorLabels: ['SystemOverloadedError', 'RetryableError']
}
}
```

5. Insert the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution.

```javascript
const start = performance.now();
expect(
await coll.insertOne({ a: 1 }).catch(e => e)
).to.be.an.instanceof(MongoServerError);
const end = performance.now();
```

6. Run the following command to set up `retryAfterMS` on overload errors.

```python
client.admin.command("setParameter", 1, overloadRetryAfterMS=50)
```

7. Execute step 5 again.

8. Run the following command to disable `retryAfterMS` on overload errors.

```python
client.admin.command("setParameter", 1, overloadRetryAfterMS=0)
```

9. Compare the time between the two runs.

```python
assertTrue(absolute_value(exponential_backoff_time - (with_retry_after_ms_time + 0.2 seconds)) < 0.2 seconds)
```

The difference in the backoffs is 0.2 seconds. There is a 0.2-second window to account for potential variance between
the two runs.
7 changes: 6 additions & 1 deletion source/mongodb-handshake/handshake.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ Drivers MUST use the `OP_MSG` protocol for all handshakes if their minWireVersio
MUST use legacy hello for the first message of the initial handshake, and include `helloOk:true` in the handshake
request.

Drivers MUST include `backpressure: "2"` in their handshake request in order to explicitly version their supported
version of the client backpressure specification. The value of `backpressure` MUST be the string `"2"` and not a literal
number `2`.

If the legacy handshake response includes `helloOk: true`, then subsequent topology monitoring commands MUST use the
`hello` command. If the legacy handshake response does not include `helloOk: true`, then subsequent topology monitoring
commands MUST use the legacy hello command. Additionally, note that if the server does not understand `OP_MSG`, the
Expand Down Expand Up @@ -85,7 +89,7 @@ if stable_api_configured or client_options.load_balanced:
cmd = {"hello": 1}
else:
cmd = {"legacy hello": 1, "helloOk": 1}
cmd["backpressure"] = True
cmd["backpressure"] = "2"
Comment thread
jyemin marked this conversation as resolved.
cmd["client"] = client_metadata
if client_options.compressors:
cmd["compression"] = client_options.compressors
Expand Down Expand Up @@ -561,6 +565,7 @@ support the `hello` command, the `helloOk: true` argument is ignored and the leg

## Changelog

- 2026-06-25: Clarify the client backpressure component of the handshake.
- 2026-06-11: Clarify that there is no new behavior as a result of only using OP_MSG for all handshakes.
- 2026-06-05: Use OP_MSG for all handshakes.
- 2025-09-04: Clarify that drivers do not append the same metadata multiple times.
Expand Down
4 changes: 2 additions & 2 deletions source/mongodb-handshake/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,7 +499,7 @@ Before each test case, perform the setup.

8. Assert that `initialClientMetadata` is identical to `updatedClientMetadata`.

### Test 9: Handshake documents include `backpressure: true`
### Test 9: Handshake documents include `backpressure: "2"`

These tests require a mechanism for observing handshake documents sent to the server.

Expand All @@ -511,4 +511,4 @@ These tests require a mechanism for observing handshake documents sent to the se

3. Assert that for every handshake document intercepted:

1. The document has a field `backpressure` whose value is `true`.
1. The document has a field `backpressure` whose value is `"2"`.
Comment thread
blink1073 marked this conversation as resolved.
Loading