perf(api): use literal scan_ids in finding-groups /latest aggregation by AdriiiPRodri · Pull Request #11380 · prowler-cloud/prowler

AdriiiPRodri · 2026-05-28T08:40:47Z

Context

GET /api/v1/finding-groups/latest?filter[region__in]=... was multi-second on tenants where a single latest completed scan held a large share of findings. The finding-level aggregation (FindingGroupViewSet._aggregate_findings) restricts to each provider's most recent completed scan via:

scan_id IN (SELECT DISTINCT ON (provider_id) id FROM scans
            WHERE state='completed' AND tenant_id=...
            ORDER BY provider_id, completed_at DESC, inserted_at DESC)

When latest_scan_ids was left as a Django QuerySet, this rendered as an inline subquery. Postgres cannot push selectivity statistics through that shape and underestimated the match count badly (estimated ~11k rows vs actual 70k-700k depending on how concentrated the latest scan is). Under that estimate the planner chose a serial Nested Loop into resource_finding_mappings for the two COUNT(DISTINCT resource_id) and a single ~62 MB external-merge sort on (check_id, resource_id), making the endpoint multi-second whenever a single latest scan held a large share of findings.

Description

_get_latest_findings_per_provider now resolves the scan ids into a concrete Python list before filtering (list(... .values_list("id", flat=True))). The main query then emits a literal scan_id IN (uuid, uuid, ...). With an accurate row estimate the planner switches to a Parallel Hash Join and per-worker sorts.

The change is behaviour-preserving: same scan ids, byte-identical 300-group output (verified by md5-equal raw SQL output and the full TestFindingGroupViewSet pytest suite, 160/160 passing). The added cost is one extra small indexed lookup of one scan id per provider before the main query runs. Scope is surgical: only the finding-level branch of /finding-groups/latest. The pre-aggregated FindingGroupDailySummary path and the date-filtered /finding-groups list path are untouched.

Steps to review

Read the diff (one method, +14/-3 in api/src/backend/api/v1/views.py: _get_latest_findings_per_provider).
Confirm the new code resolves the scan ids into a Python list before filtering and that the rest of the method semantics are identical (same Scan.objects.filter(...) chain, same scan_id__in filter).
(Optional) Run the test suite locally: pytest api/tests/test_views.py::TestFindingGroupViewSet.

API evidence

EXPLAIN ANALYZE on the original pasted query, worst shape (2M findings, ~1.8M under the latest scans, ~629k matching us-east-1; Postgres 16, work_mem=4MB, jit=on, warm):

Before (scan_id IN (subquery)):

Execution Time:  32609 ms
Planning Time:     195 ms
Plan: Nested Loop Left Join (serial) into resource_finding_mappings
Sort Method:     external merge  Disk: 952 MB
JIT functions:   155
Workers Launched: 0

After (scan_id IN (literal list)):

Execution Time:   8878 ms
Planning Time:     20 ms
Plan: Parallel Hash Join + Parallel Append into resource_finding_mappings
Sort Method:     external merge  Disk: 322 MB (per worker ~315 MB)
JIT functions:   275
Workers Launched: 2

Performance results, ORM end-to-end on the same 2M-finding seeded tenant (warm):

Shape	Old (subquery)	New (literal list)	Speedup
Normal (~10% findings under latest scan, ~70k matching)	~1.4 s	~0.86 s	~1.7x
Single huge latest scan (~700k matching)	~11.3 s warm SQL / ~17.8 s ORM cold	~3.1 s	~3.6x / ~5.7x
Fan-out (3 distinct resources per finding, ~2.1 M join rows)	~13.5 s	~5.3 s	~2.6x

Checklist

Behaviour preserved: identical 300-group output (md5-equal raw SQL and 160/160 TestFindingGroupViewSet tests passing).
EXPLAIN ANALYZE and performance results included above.
No API spec changes.
No version updates required.
No CHANGELOG entry (no-changelog).

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The /finding-groups/latest finding-level path (FindingGroupViewSet._aggregate_findings, taken when a finding-level-only filter such as region__in is present) restricts to each provider's most recent completed scan with: scan_id IN (SELECT DISTINCT ON (provider_id) id FROM scans WHERE state='completed' AND tenant_id=... ORDER BY provider_id, completed_at DESC, inserted_at DESC) When latest_scan_ids was left as a Django QuerySet, this rendered as an inline subquery. The Postgres planner cannot push selectivity stats through that shape and underestimates the match count badly (estimated ~11k rows vs actual 70k-700k depending on how concentrated the latest scan is). Under that estimate it chose a serial nested loop into resource_finding_mappings for the two COUNT(DISTINCT resource_id) and a single ~62 MB external-merge sort on (check_id, resource_id), making the endpoint multi-second whenever a single latest scan held a large share of findings. EXPLAIN ANALYZE on the original pasted query at the worst shape showed 32.6 s with a Nested Loop Left Join, a 952 MB sort spill and no parallel workers. _get_latest_findings_per_provider now resolves the scan ids into a concrete Python list before filtering (list(... .values_list("id", flat=True))). The main query then emits a literal scan_id IN (uuid, uuid, ...), the planner gets an accurate row estimate, and switches to a Parallel Hash Join into resource_finding_mappings (Workers Launched: 2 on the dev stack) with per-worker sorts. The same EXPLAIN at the worst shape drops to 8.9 s. The change is behaviour-preserving: same scan ids, byte-identical 300-group output (verified by md5-equal raw SQL output and the full TestFindingGroupViewSet pytest suite, 160/160 passing). The added cost is one extra small indexed lookup of one scan id per provider before the main query runs. Measured end-to-end through the ORM on a 2M-finding seeded tenant (Postgres 16, work_mem=4MB, jit=on, warm): - Normal shape (~10% of findings under the latest scan, ~70k matching the region filter): ~1.4 s -> ~0.86 s (~1.7x). - Single huge latest scan (~700k matching): ~11.3 s warm SQL / ~17.8 s ORM cold -> ~3.1 s (~3.6-5.7x). - Resource fan-out (3 distinct resources per finding, mappings join widens to ~2.1 M rows): ~13.5 s -> ~5.3 s (~2.6x). Scope is surgical: only _get_latest_findings_per_provider, used by the finding-level branch of /finding-groups/latest. The pre-aggregated FindingGroupDailySummary path and the date-filtered /finding-groups list path are untouched.

github-actions · 2026-05-28T08:41:21Z

✅ Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

github-actions · 2026-05-28T08:45:39Z

🔒 Container Security Scan

Image: prowler-api:a1f1fda
Last scan: 2026-05-28 11:32:00 UTC

📊 Vulnerability Summary

Severity	Count
🔴 Critical	21
Total	21

15 package(s) affected

⚠️ Action Required

Critical severity vulnerabilities detected. These should be addressed before merging:

Review the detailed scan results
Update affected packages to patched versions
Consider using a different base image if updates are unavailable

📋 Resources:

Download full report (see artifacts)
View in Security tab
Scanned with Trivy

codecov · 2026-05-28T08:54:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.96%. Comparing base (329dfdf) to head (73d3549).
⚠️ Report is 20 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #11380   +/-   ##
=======================================
  Coverage   93.96%   93.96%           
=======================================
  Files         237      237           
  Lines       34901    34901           
=======================================
  Hits        32793    32793           
  Misses       2108     2108

Flag	Coverage Δ
api	`93.96% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
prowler	`∅ <ø> (∅)`
api	`93.96% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

josema-xyz

Do you like that proposed comment? That way we don't mention things like the 6.2x improvement, as that can be deprecated and the comment turns out wrong.

Co-authored-by: Josema Camacho <josema@prowler.com>

AdriiiPRodri requested a review from a team as a code owner May 28, 2026 08:40

AdriiiPRodri added component/api no-changelog Skip including change in changelog/release notes labels May 28, 2026

josema-xyz reviewed May 28, 2026

View reviewed changes

Comment thread api/src/backend/api/v1/views.py Outdated

Update views.py

7beec00

Co-authored-by: Josema Camacho <josema@prowler.com>

josema-xyz previously approved these changes May 28, 2026

View reviewed changes

chore: ruff format

73d3549

AdriiiPRodri dismissed josema-xyz’s stale review via 73d3549 May 28, 2026 11:29

josema-xyz approved these changes May 28, 2026

View reviewed changes

AdriiiPRodri merged commit 81226cd into master May 28, 2026
30 checks passed

AdriiiPRodri deleted the user-finding-query branch May 28, 2026 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(api): use literal scan_ids in finding-groups /latest aggregation#11380

perf(api): use literal scan_ids in finding-groups /latest aggregation#11380
AdriiiPRodri merged 3 commits into
masterfrom
user-finding-query

AdriiiPRodri commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

josema-xyz left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AdriiiPRodri commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Steps to review

API evidence

Checklist

License

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Container Security Scan

📊 Vulnerability Summary

⚠️ Action Required

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

josema-xyz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdriiiPRodri commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

codecov Bot commented May 28, 2026 •

edited

Loading