MINOR: Reduce close delay in testFailureToFenceEpoch#22347
Conversation
|
I used the following loop to re-run both affected tests 10 times and summarize their runtimes. It removes only the test result directories between runs, so Gradle can keep compilation tasks up-to-date while still re-running the tests. OUT=/tmp/fence-epoch-runs.tsv
XML=storage/build/test-results/test/TEST-org.apache.kafka.tiered.storage.integration.TransactionsTest.xml
: > "$OUT"
for run in $(seq 1 10); do
echo "===== run ${run}/10 ====="
rm -rf storage/build/test-results/test storage/build/reports/tests/test storage/build/test-results/binary/test
./gradlew :storage:test \
--tests org.apache.kafka.tiered.storage.integration.TransactionsTest.testFailureToFenceEpochWithTV1 \
--tests org.apache.kafka.tiered.storage.integration.TransactionsTest.testFailureToFenceEpochWithTV2
sed -n 's/.*<testcase name="\([^"]*testFailureToFenceEpochWithTV[12][^"]*\)".* time="\([^"]*\)".*/\1\t\2/p' "$XML" |
awk -F "\t" -v run="$run" '{
test = $1
sub(/^.*testFailureToFenceEpochWithTV/, "testFailureToFenceEpochWithTV", test)
sub(/ .*/, "", test)
print run "\t" test "\t" $2
}' >> "$OUT"
done
summarize() {
local label="$1"
awk -v label="$label" '
{ times[++count] = $1; sum += $1; if ($1 > max) max = $1 }
END {
if (count == 0) {
printf "%s: count=0\n", label
exit
}
if (count % 2 == 1) {
median = times[(count + 1) / 2]
} else {
median = (times[count / 2] + times[count / 2 + 1]) / 2
}
printf "%s: count=%d, max=%.3fs, median=%.3fs, avg=%.3fs\n", label, count, max, median, sum / count
}
'
}
for test_name in testFailureToFenceEpochWithTV1 testFailureToFenceEpochWithTV2; do
awk -v test="$test_name" '$2 == test { print $3 }' "$OUT" | sort -n | summarize "$test_name"
done
awk '{ print $3 }' "$OUT" | sort -n | summarize "ALL"Results:
|
| long producerId; | ||
| int initialProducerEpoch; | ||
|
|
||
| try (var producer1 = createTransactionalProducer(clusterInstance, "transactional-producer")) { |
There was a problem hiding this comment.
How about producer1.close(duration 5 secs) in finally ? This will force a hard close after the time out, instead of waiting for pending reqs.
There was a problem hiding this comment.
How about producer1.close(duration 5 secs) in finally ? This will force a hard close after the time out, instead of waiting for pending reqs.
Thanks, that’s a fair point. I considered close(Duration.ofSeconds(5)) as well, but it changes the cleanup behavior to force-close after the timeout. The current approach keeps the try-with-resources close graceful, while bounding the pending requests with a shorter request timeout. Maybe we should decide which behavior is more appropriate for this test.
Reduce the request timeout for the producer used by testFailureToFenceEpoch so graceful close does not wait on the default 30s timeout after the fencing probe leaves the producer in an empty transaction.
Tested by running testFailureToFenceEpochWithTV1 and testFailureToFenceEpochWithTV2.