perf(api): use literal scan_ids in finding-groups /latest aggregation#11380
Merged
Conversation
The /finding-groups/latest finding-level path
(FindingGroupViewSet._aggregate_findings, taken when a finding-level-only
filter such as region__in is present) restricts to each provider's most
recent completed scan with:
scan_id IN (SELECT DISTINCT ON (provider_id) id FROM scans
WHERE state='completed' AND tenant_id=...
ORDER BY provider_id, completed_at DESC,
inserted_at DESC)
When latest_scan_ids was left as a Django QuerySet, this rendered as an
inline subquery. The Postgres planner cannot push selectivity stats
through that shape and underestimates the match count badly (estimated
~11k rows vs actual 70k-700k depending on how concentrated the latest
scan is). Under that estimate it chose a serial nested loop into
resource_finding_mappings for the two COUNT(DISTINCT resource_id) and a
single ~62 MB external-merge sort on (check_id, resource_id), making the
endpoint multi-second whenever a single latest scan held a large share
of findings. EXPLAIN ANALYZE on the original pasted query at the worst
shape showed 32.6 s with a Nested Loop Left Join, a 952 MB sort spill
and no parallel workers.
_get_latest_findings_per_provider now resolves the scan ids into a
concrete Python list before filtering
(list(... .values_list("id", flat=True))). The main query then emits a
literal scan_id IN (uuid, uuid, ...), the planner gets an accurate row
estimate, and switches to a Parallel Hash Join into
resource_finding_mappings (Workers Launched: 2 on the dev stack) with
per-worker sorts. The same EXPLAIN at the worst shape drops to 8.9 s.
The change is behaviour-preserving: same scan ids, byte-identical
300-group output (verified by md5-equal raw SQL output and the full
TestFindingGroupViewSet pytest suite, 160/160 passing). The added cost
is one extra small indexed lookup of one scan id per provider before the
main query runs.
Measured end-to-end through the ORM on a 2M-finding seeded tenant
(Postgres 16, work_mem=4MB, jit=on, warm):
- Normal shape (~10% of findings under the latest scan, ~70k matching
the region filter): ~1.4 s -> ~0.86 s (~1.7x).
- Single huge latest scan (~700k matching): ~11.3 s warm SQL /
~17.8 s ORM cold -> ~3.1 s (~3.6-5.7x).
- Resource fan-out (3 distinct resources per finding, mappings join
widens to ~2.1 M rows): ~13.5 s -> ~5.3 s (~2.6x).
Scope is surgical: only _get_latest_findings_per_provider, used by the
finding-level branch of /finding-groups/latest. The pre-aggregated
FindingGroupDailySummary path and the date-filtered /finding-groups
list path are untouched.
Contributor
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
Contributor
🔒 Container Security ScanImage: 📊 Vulnerability Summary
15 package(s) affected
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11380 +/- ##
=======================================
Coverage 93.96% 93.96%
=======================================
Files 237 237
Lines 34901 34901
=======================================
Hits 32793 32793
Misses 2108 2108
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
josema-xyz
reviewed
May 28, 2026
Contributor
josema-xyz
left a comment
There was a problem hiding this comment.
Do you like that proposed comment? That way we don't mention things like the 6.2x improvement, as that can be deprecated and the comment turns out wrong.
Co-authored-by: Josema Camacho <josema@prowler.com>
josema-xyz
previously approved these changes
May 28, 2026
josema-xyz
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
GET /api/v1/finding-groups/latest?filter[region__in]=...was multi-second on tenants where a single latest completed scan held a large share of findings. The finding-level aggregation (FindingGroupViewSet._aggregate_findings) restricts to each provider's most recent completed scan via:When
latest_scan_idswas left as a Django QuerySet, this rendered as an inline subquery. Postgres cannot push selectivity statistics through that shape and underestimated the match count badly (estimated ~11k rows vs actual 70k-700k depending on how concentrated the latest scan is). Under that estimate the planner chose a serial Nested Loop intoresource_finding_mappingsfor the twoCOUNT(DISTINCT resource_id)and a single ~62 MB external-merge sort on(check_id, resource_id), making the endpoint multi-second whenever a single latest scan held a large share of findings.Description
_get_latest_findings_per_providernow resolves the scan ids into a concrete Python list before filtering (list(... .values_list("id", flat=True))). The main query then emits a literalscan_id IN (uuid, uuid, ...). With an accurate row estimate the planner switches to a Parallel Hash Join and per-worker sorts.The change is behaviour-preserving: same scan ids, byte-identical 300-group output (verified by md5-equal raw SQL output and the full
TestFindingGroupViewSetpytest suite, 160/160 passing). The added cost is one extra small indexed lookup of one scan id per provider before the main query runs. Scope is surgical: only the finding-level branch of/finding-groups/latest. The pre-aggregatedFindingGroupDailySummarypath and the date-filtered/finding-groupslist path are untouched.Steps to review
api/src/backend/api/v1/views.py:_get_latest_findings_per_provider).Scan.objects.filter(...)chain, samescan_id__infilter).pytest api/tests/test_views.py::TestFindingGroupViewSet.API evidence
EXPLAIN ANALYZE on the original pasted query, worst shape (2M findings, ~1.8M under the latest scans, ~629k matching
us-east-1; Postgres 16,work_mem=4MB,jit=on, warm):Before (
scan_id IN (subquery)):After (
scan_id IN (literal list)):Performance results, ORM end-to-end on the same 2M-finding seeded tenant (warm):
Checklist
TestFindingGroupViewSettests passing).no-changelog).License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.