Skip to content

Commit 3613bb3

Browse files
satishkothaclaude
andauthored
S3 batch delete: log per-object failures and add a failure counter (#3257)
S3 DeleteObjects always returns HTTP 200 with per-object errors in the XML response body. Today, when a sub-op fails inside a batch, the failure is captured into the response payload but is never logged or counted on the frontend, so partial failures are invisible to operators unless every response body is parsed offline. Two changes: - Log every per-object failure at WARN with key, error code, the parent request URI, and the exception (stack trace included via SLF4J's trailing-throwable convention). Operators can now grep 'S3 batch delete sub-op failed' in ambry-frontend.log to find failed deletes inside otherwise-200 batches. - Add s3BatchDeleteSubOpFailureCount Counter to FrontendMetrics. Increments per failed sub-op. Lets dashboards/alerts track partial-failure rate without log scraping. No protocol or response-body change. The handler's HTTP behavior is unchanged: still returns 200 with the same DeleteResult XML containing deleted/error lists. Existing S3BatchDeleteHandlerTest (6 tests) passes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7273f9f commit 3613bb3

2 files changed

Lines changed: 14 additions & 5 deletions

File tree

ambry-frontend/src/main/java/com/github/ambry/frontend/FrontendMetrics.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,11 @@ public class FrontendMetrics {
175175
public final AsyncOperationTracker.Metrics s3PutHandleMetrics;
176176
public final AsyncOperationTracker.Metrics s3GetHandleMetrics;
177177

178+
// Counts every per-object delete failure inside an S3 batch-delete request. Surfaces partial
179+
// failures that S3 DeleteObjects returns inside the (HTTP 200) response body, which were
180+
// previously invisible to operators without parsing every response payload.
181+
public final Counter s3BatchDeleteSubOpFailureCount;
182+
178183
// Rates
179184
// AmbrySecurityService
180185
public final Meter securityServicePreProcessRequestRate;
@@ -533,6 +538,8 @@ public FrontendMetrics(MetricRegistry metricRegistry, FrontendConfig frontendCon
533538
s3DeleteHandleMetrics = new AsyncOperationTracker.Metrics(S3DeleteHandler.class, "S3Handle", metricRegistry);
534539
s3BatchDeleteHandleMetrics =
535540
new AsyncOperationTracker.Metrics(S3BatchDeleteHandler.class, "S3Handle", metricRegistry);
541+
s3BatchDeleteSubOpFailureCount =
542+
metricRegistry.counter(MetricRegistry.name(S3BatchDeleteHandler.class, "SubOpFailureCount"));
536543
s3ListHandleMetrics = new AsyncOperationTracker.Metrics(S3ListHandler.class, "S3Handle", metricRegistry);
537544
s3PutHandleMetrics = new AsyncOperationTracker.Metrics(S3PutHandler.class, "S3Handle", metricRegistry);
538545
s3GetHandleMetrics = new AsyncOperationTracker.Metrics(S3GetHandler.class, "S3Handle", metricRegistry);

ambry-frontend/src/main/java/com/github/ambry/frontend/s3/S3BatchDeleteHandler.java

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -150,14 +150,16 @@ private Callback<Long> parseRequestBodyAndDeleteCallback(RetainingAsyncWritableC
150150

151151
// Handle the delete operation using the deleteBlobHandler
152152
deleteBlobHandler.handle(singleDeleteRequest, noOpResponseChannel, (result, exception) -> {
153-
// Call our custom onDeleteCompletion to track success/failure
154153
if (exception == null) {
155154
deleted.add(new S3MessagePayload.S3DeletedObject(object.getKey()));
156-
} else if (exception instanceof RestServiceException) {
157-
RestServiceException restServiceException = (RestServiceException) exception;
158-
errors.add((new S3MessagePayload.S3ErrorObject(object.getKey(), restServiceException.getErrorCode().toString())));
159155
} else {
160-
errors.add((new S3MessagePayload.S3ErrorObject(object.getKey(), RestServiceErrorCode.InternalServerError.toString())));
156+
String errorCode = (exception instanceof RestServiceException)
157+
? ((RestServiceException) exception).getErrorCode().toString()
158+
: RestServiceErrorCode.InternalServerError.toString();
159+
errors.add(new S3MessagePayload.S3ErrorObject(object.getKey(), errorCode));
160+
metrics.s3BatchDeleteSubOpFailureCount.inc();
161+
logger.warn("S3 batch delete sub-op failed: key={} errorCode={} requestUri={}",
162+
object.getKey(), errorCode, restRequest.getUri(), exception);
161163
}
162164
future.complete(null);
163165
});

0 commit comments

Comments
 (0)