Skip to content

perf(api): use literal scan_ids in finding-groups /latest aggregation#11380

Merged
AdriiiPRodri merged 3 commits into
masterfrom
user-finding-query
May 28, 2026
Merged

perf(api): use literal scan_ids in finding-groups /latest aggregation#11380
AdriiiPRodri merged 3 commits into
masterfrom
user-finding-query

Conversation

@AdriiiPRodri
Copy link
Copy Markdown
Contributor

@AdriiiPRodri AdriiiPRodri commented May 28, 2026

Context

GET /api/v1/finding-groups/latest?filter[region__in]=... was multi-second on tenants where a single latest completed scan held a large share of findings. The finding-level aggregation (FindingGroupViewSet._aggregate_findings) restricts to each provider's most recent completed scan via:

scan_id IN (SELECT DISTINCT ON (provider_id) id FROM scans
            WHERE state='completed' AND tenant_id=...
            ORDER BY provider_id, completed_at DESC, inserted_at DESC)

When latest_scan_ids was left as a Django QuerySet, this rendered as an inline subquery. Postgres cannot push selectivity statistics through that shape and underestimated the match count badly (estimated ~11k rows vs actual 70k-700k depending on how concentrated the latest scan is). Under that estimate the planner chose a serial Nested Loop into resource_finding_mappings for the two COUNT(DISTINCT resource_id) and a single ~62 MB external-merge sort on (check_id, resource_id), making the endpoint multi-second whenever a single latest scan held a large share of findings.

Description

_get_latest_findings_per_provider now resolves the scan ids into a concrete Python list before filtering (list(... .values_list("id", flat=True))). The main query then emits a literal scan_id IN (uuid, uuid, ...). With an accurate row estimate the planner switches to a Parallel Hash Join and per-worker sorts.

The change is behaviour-preserving: same scan ids, byte-identical 300-group output (verified by md5-equal raw SQL output and the full TestFindingGroupViewSet pytest suite, 160/160 passing). The added cost is one extra small indexed lookup of one scan id per provider before the main query runs. Scope is surgical: only the finding-level branch of /finding-groups/latest. The pre-aggregated FindingGroupDailySummary path and the date-filtered /finding-groups list path are untouched.

Steps to review

  1. Read the diff (one method, +14/-3 in api/src/backend/api/v1/views.py: _get_latest_findings_per_provider).
  2. Confirm the new code resolves the scan ids into a Python list before filtering and that the rest of the method semantics are identical (same Scan.objects.filter(...) chain, same scan_id__in filter).
  3. (Optional) Run the test suite locally: pytest api/tests/test_views.py::TestFindingGroupViewSet.

API evidence

EXPLAIN ANALYZE on the original pasted query, worst shape (2M findings, ~1.8M under the latest scans, ~629k matching us-east-1; Postgres 16, work_mem=4MB, jit=on, warm):

Before (scan_id IN (subquery)):

Execution Time:  32609 ms
Planning Time:     195 ms
Plan: Nested Loop Left Join (serial) into resource_finding_mappings
Sort Method:     external merge  Disk: 952 MB
JIT functions:   155
Workers Launched: 0

After (scan_id IN (literal list)):

Execution Time:   8878 ms
Planning Time:     20 ms
Plan: Parallel Hash Join + Parallel Append into resource_finding_mappings
Sort Method:     external merge  Disk: 322 MB (per worker ~315 MB)
JIT functions:   275
Workers Launched: 2

Performance results, ORM end-to-end on the same 2M-finding seeded tenant (warm):

Shape Old (subquery) New (literal list) Speedup
Normal (~10% findings under latest scan, ~70k matching) ~1.4 s ~0.86 s ~1.7x
Single huge latest scan (~700k matching) ~11.3 s warm SQL / ~17.8 s ORM cold ~3.1 s ~3.6x / ~5.7x
Fan-out (3 distinct resources per finding, ~2.1 M join rows) ~13.5 s ~5.3 s ~2.6x

Checklist

  • Behaviour preserved: identical 300-group output (md5-equal raw SQL and 160/160 TestFindingGroupViewSet tests passing).
  • EXPLAIN ANALYZE and performance results included above.
  • No API spec changes.
  • No version updates required.
  • No CHANGELOG entry (no-changelog).

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The /finding-groups/latest finding-level path
(FindingGroupViewSet._aggregate_findings, taken when a finding-level-only
filter such as region__in is present) restricts to each provider's most
recent completed scan with:

    scan_id IN (SELECT DISTINCT ON (provider_id) id FROM scans
                WHERE state='completed' AND tenant_id=...
                ORDER BY provider_id, completed_at DESC,
                         inserted_at DESC)

When latest_scan_ids was left as a Django QuerySet, this rendered as an
inline subquery. The Postgres planner cannot push selectivity stats
through that shape and underestimates the match count badly (estimated
~11k rows vs actual 70k-700k depending on how concentrated the latest
scan is). Under that estimate it chose a serial nested loop into
resource_finding_mappings for the two COUNT(DISTINCT resource_id) and a
single ~62 MB external-merge sort on (check_id, resource_id), making the
endpoint multi-second whenever a single latest scan held a large share
of findings. EXPLAIN ANALYZE on the original pasted query at the worst
shape showed 32.6 s with a Nested Loop Left Join, a 952 MB sort spill
and no parallel workers.

_get_latest_findings_per_provider now resolves the scan ids into a
concrete Python list before filtering
(list(... .values_list("id", flat=True))). The main query then emits a
literal scan_id IN (uuid, uuid, ...), the planner gets an accurate row
estimate, and switches to a Parallel Hash Join into
resource_finding_mappings (Workers Launched: 2 on the dev stack) with
per-worker sorts. The same EXPLAIN at the worst shape drops to 8.9 s.

The change is behaviour-preserving: same scan ids, byte-identical
300-group output (verified by md5-equal raw SQL output and the full
TestFindingGroupViewSet pytest suite, 160/160 passing). The added cost
is one extra small indexed lookup of one scan id per provider before the
main query runs.

Measured end-to-end through the ORM on a 2M-finding seeded tenant
(Postgres 16, work_mem=4MB, jit=on, warm):

- Normal shape (~10% of findings under the latest scan, ~70k matching
  the region filter): ~1.4 s -> ~0.86 s (~1.7x).
- Single huge latest scan (~700k matching): ~11.3 s warm SQL /
  ~17.8 s ORM cold -> ~3.1 s (~3.6-5.7x).
- Resource fan-out (3 distinct resources per finding, mappings join
  widens to ~2.1 M rows): ~13.5 s -> ~5.3 s (~2.6x).

Scope is surgical: only _get_latest_findings_per_provider, used by the
finding-level branch of /finding-groups/latest. The pre-aggregated
FindingGroupDailySummary path and the date-filtered /finding-groups
list path are untouched.
@AdriiiPRodri AdriiiPRodri requested a review from a team as a code owner May 28, 2026 08:40
@AdriiiPRodri AdriiiPRodri added component/api no-changelog Skip including change in changelog/release notes labels May 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

🔒 Container Security Scan

Image: prowler-api:a1f1fda
Last scan: 2026-05-28 11:32:00 UTC

📊 Vulnerability Summary

Severity Count
🔴 Critical 21
Total 21

15 package(s) affected

⚠️ Action Required

Critical severity vulnerabilities detected. These should be addressed before merging:

  • Review the detailed scan results
  • Update affected packages to patched versions
  • Consider using a different base image if updates are unavailable

📋 Resources:

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.96%. Comparing base (329dfdf) to head (73d3549).
⚠️ Report is 20 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #11380   +/-   ##
=======================================
  Coverage   93.96%   93.96%           
=======================================
  Files         237      237           
  Lines       34901    34901           
=======================================
  Hits        32793    32793           
  Misses       2108     2108           
Flag Coverage Δ
api 93.96% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
prowler ∅ <ø> (∅)
api 93.96% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@josema-xyz josema-xyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you like that proposed comment? That way we don't mention things like the 6.2x improvement, as that can be deprecated and the comment turns out wrong.

Comment thread api/src/backend/api/v1/views.py Outdated
Co-authored-by: Josema Camacho <josema@prowler.com>
josema-xyz
josema-xyz previously approved these changes May 28, 2026
@AdriiiPRodri AdriiiPRodri merged commit 81226cd into master May 28, 2026
30 checks passed
@AdriiiPRodri AdriiiPRodri deleted the user-finding-query branch May 28, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/api no-changelog Skip including change in changelog/release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants