diff --git a/source/client-backpressure/client-backpressure.md b/source/client-backpressure/client-backpressure.md index 6aea3d2aaa..610279123a 100644 --- a/source/client-backpressure/client-backpressure.md +++ b/source/client-backpressure/client-backpressure.md @@ -129,12 +129,16 @@ rules: - To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and [retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See [Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand) -3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff - according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` - - `jitter` is a random jitter value between 0 and 1. - - `BASE_BACKOFF` is constant 100ms. - - `MAX_BACKOFF` is 10000ms. - - This results in delays of 100ms and 200ms before accounting for jitter. +3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply backoff according to the + following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))` + 1. `jitter` is a random jitter value between 0 and 1. + 2. `MAX_BACKOFF` is 10000ms. + 3. `BASE_BACKOFF` is constant 100ms. + 4. `attempt` is the retry number. The first retry is `attempt = 1`, the second is `attempt = 2`, and so on. + 5. This results in delays of 100ms and 200ms before accounting for jitter. + 6. If `retryAfterMS` is present on the error and has a positive value, the client MUST use that value instead of + `BASE_BACKOFF`. `retryAfterMS` represents a server-supplied base backoff to use in place of the driver's + default. 4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the client MUST add the previously used server's address to the list of deprioritized server addresses for [server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a @@ -214,8 +218,11 @@ def execute_command_retryable(command, ...): if is_overload: jitter = random.random() # Random float between [0.0, 1.0). - backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2 ** (attempt - 1)) - + retry_after = BASE_BACKOFF + # If present on the error, retryAfterMS overrides the base backoff + if exc.retry_after_ms: + retry_after = exc.retry_after_ms / 1000 # Convert from milliseconds to seconds + backoff = jitter * min(MAX_BACKOFF, retry_after * 2 ** (attempt - 1)) # If the delay exceeds the deadline, bail early. if _csot.get_timeout(): if time.monotonic() + backoff > _csot.get_deadline(): @@ -433,6 +440,8 @@ to understand and configure. ## Changelog +- 2026-06-16: Add support for retryAfterMS backoff calculation. + - 2026-04-14: Clarify correct retry behavior when a mix of overload and non-overload errors are encountered. - 2026-03-30: Introduce phase 1 support without token buckets. diff --git a/source/client-backpressure/tests/README.md b/source/client-backpressure/tests/README.md index 17becefd0a..e1c47f2d3d 100644 --- a/source/client-backpressure/tests/README.md +++ b/source/client-backpressure/tests/README.md @@ -119,3 +119,62 @@ option. 5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels. 6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2). + +#### Test 5: Overload Errors with retryAfterMS override base backoff + +Drivers SHOULD test that overload errors with `retryAfterMS` override the default backoff duration. This test MUST be +executed against a MongoDB 9.0+ server that has enabled the `configureFailPoint` command with the `errorLabels` option. + +1. Let `client` be a `MongoClient`. + +2. Let `coll` be a collection. + +3. Configure the random number generator used for exponential backoff jitter to always return a number as close as + possible to `1`. + +4. Configure the following failPoint: + + ```javascript + { + configureFailPoint: 'failCommand', + mode: 'alwaysOn', + data: { + failCommands: ['insert'], + errorCode: 462, + errorLabels: ['SystemOverloadedError', 'RetryableError'] + } + } + ``` + +5. Insert the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution. + + ```javascript + const start = performance.now(); + expect( + await coll.insertOne({ a: 1 }).catch(e => e) + ).to.be.an.instanceof(MongoServerError); + const end = performance.now(); + ``` + +6. Run the following command to set up `retryAfterMS` on overload errors. + + ```python + client.admin.command("setParameter", 1, overloadRetryAfterMS=50) + ``` + +7. Execute step 5 again. + +8. Run the following command to disable `retryAfterMS` on overload errors. + + ```python + client.admin.command("setParameter", 1, overloadRetryAfterMS=0) + ``` + +9. Compare the time between the two runs. + + ```python + assertTrue(absolute_value(exponential_backoff_time - (with_retry_after_ms_time + 0.2 seconds)) < 0.2 seconds) + ``` + + The difference in the backoffs is 0.2 seconds. There is a 0.2-second window to account for potential variance between + the two runs. diff --git a/source/mongodb-handshake/handshake.md b/source/mongodb-handshake/handshake.md index e8d39bf6b9..07c326b5d4 100644 --- a/source/mongodb-handshake/handshake.md +++ b/source/mongodb-handshake/handshake.md @@ -55,6 +55,10 @@ Drivers MUST use the `OP_MSG` protocol for all handshakes if their minWireVersio MUST use legacy hello for the first message of the initial handshake, and include `helloOk:true` in the handshake request. +Drivers MUST include `backpressure: "2"` in their handshake request in order to explicitly version their supported +version of the client backpressure specification. The value of `backpressure` MUST be the string `"2"` and not a literal +number `2`. + If the legacy handshake response includes `helloOk: true`, then subsequent topology monitoring commands MUST use the `hello` command. If the legacy handshake response does not include `helloOk: true`, then subsequent topology monitoring commands MUST use the legacy hello command. Additionally, note that if the server does not understand `OP_MSG`, the @@ -85,7 +89,7 @@ if stable_api_configured or client_options.load_balanced: cmd = {"hello": 1} else: cmd = {"legacy hello": 1, "helloOk": 1} -cmd["backpressure"] = True +cmd["backpressure"] = "2" cmd["client"] = client_metadata if client_options.compressors: cmd["compression"] = client_options.compressors @@ -561,6 +565,7 @@ support the `hello` command, the `helloOk: true` argument is ignored and the leg ## Changelog +- 2026-06-25: Clarify the client backpressure component of the handshake. - 2026-06-11: Clarify that there is no new behavior as a result of only using OP_MSG for all handshakes. - 2026-06-05: Use OP_MSG for all handshakes. - 2025-09-04: Clarify that drivers do not append the same metadata multiple times. diff --git a/source/mongodb-handshake/tests/README.md b/source/mongodb-handshake/tests/README.md index 296c87ee02..ed56e5dc46 100644 --- a/source/mongodb-handshake/tests/README.md +++ b/source/mongodb-handshake/tests/README.md @@ -499,7 +499,7 @@ Before each test case, perform the setup. 8. Assert that `initialClientMetadata` is identical to `updatedClientMetadata`. -### Test 9: Handshake documents include `backpressure: true` +### Test 9: Handshake documents include `backpressure: "2"` These tests require a mechanism for observing handshake documents sent to the server. @@ -511,4 +511,4 @@ These tests require a mechanism for observing handshake documents sent to the se 3. Assert that for every handshake document intercepted: - 1. The document has a field `backpressure` whose value is `true`. + 1. The document has a field `backpressure` whose value is `"2"`.