Skip to content

perf(api): reduce DB load in scan hot loop by 13x#11249

Open
AdriiiPRodri wants to merge 7 commits into
masterfrom
perf/api-scan-loop-bulk-operations
Open

perf(api): reduce DB load in scan hot loop by 13x#11249
AdriiiPRodri wants to merge 7 commits into
masterfrom
perf/api-scan-loop-bulk-operations

Conversation

@AdriiiPRodri
Copy link
Copy Markdown
Contributor

Context

The per-finding loop in tasks/jobs/scan.py was the dominant bottleneck of perform_prowler_scan: for each finding, it issued multiple SELECT FOR UPDATE against resource_tag_mappings, opened a transaction per item, and re-fetched Resource/ResourceTag rows that were already known. For large scans this produced thousands of round-trips and lock contention, slowing the hot loop and increasing DB load.

This PR rewrites the micro-batch path to be set-oriented and atomic, with no schema changes.

Description

Changes in api/src/backend/tasks/jobs/scan.py:

  • Pre-resolve Resource and ResourceTag rows in bulk before the per-finding loop, instead of per item.
  • Replace Resource.upsert_or_delete_tags (which issued SELECT FOR UPDATE per mapping) with deferred ResourceTagMapping.bulk_create(ignore_conflicts=True) executed once at the end of the batch.
  • Wrap the entire micro-batch in a single rls_transaction (was 2N). Deadlock retry now operates at the batch level.
  • Populate Finding.resource_regions, resource_services and resource_types directly on INSERT, removing the post-INSERT bulk_update pass.
  • Raise SCAN_DB_BATCH_SIZE from 500 to 1000.
  • Add update_fields=[...] to Scan / Provider saves to avoid full-row writes.
  • Throttle progress saves to either a 1% delta or a 10s interval (whichever comes first).
  • Preserve findings with empty resource_uid (IaC scans, some Azure/GCP/K8s findings).

No schema changes. No migrations. Behaviorally, the micro-batch is now atomic: errors that were previously masked by per-finding SAVEPOINTs may now surface in logs (the batch is retried on deadlock).

Measured impact (3000 findings per micro-batch)

Metric Before After Delta
Wall-clock 20.8s 1.57s 13.2x faster
COMMIT count 6003 2 -99.97%
SELECT FOR UPDATE on resource_tag_mappings 15000 0 -100%

Steps to review

  1. Read api/src/backend/tasks/jobs/scan.py end-to-end; the change is concentrated in a single file.
  2. Confirm the bulk pre-resolution step covers all Resource / ResourceTag lookups previously done inside the loop.
  3. Verify that the deferred ResourceTagMapping.bulk_create(ignore_conflicts=True) correctly replaces the previous per-mapping upsert path (idempotent on retry).
  4. Check the new transaction boundary: one rls_transaction per micro-batch instead of per finding; confirm the deadlock retry path still re-runs the whole batch safely.
  5. Confirm that empty resource_uid findings (IaC, some Azure/GCP/K8s) are kept and stored.
  6. Run the API test suite focused on scan and reports tasks.

Checklist

Community Checklist
  • This feature/issue is listed in here or roadmap.prowler.com
  • Is it assigned to me, if not, request it via the issue/feature in here or Prowler Community Slack

SDK/CLI

  • Are there new checks included in this PR? No

API

  • All issue/task requirements work as expected on the API
  • Endpoint response output (if applicable)
  • EXPLAIN ANALYZE output for new/modified queries or indexes (if applicable)
  • Performance test results (if applicable)
  • Any other relevant evidence of the implementation (if applicable)
  • Verify if API specs need to be regenerated.
  • Check if version updates are required (e.g., specs, uv, etc.).
  • Ensure new entries are added to CHANGELOG.md, if applicable.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

- Pre-resolve Resources and ResourceTags in bulk before the per-finding loop.
- Replace `Resource.upsert_or_delete_tags` with deferred
  `ResourceTagMapping.bulk_create(ignore_conflicts=True)` at end of batch
  (eliminates the per-mapping `SELECT FOR UPDATE`).
- Wrap the entire micro-batch in a single `rls_transaction` (was 2N); deadlock
  retry now per-batch.
- Populate `Finding.resource_regions/services/types` on INSERT, dropping the
  post-INSERT `bulk_update`.
- Raise `SCAN_DB_BATCH_SIZE` from 500 to 1000.
- Add `update_fields=[...]` to `Scan`/`Provider` saves; throttle progress saves
  to 1% delta or 10s.
- Preserve findings with empty `resource_uid` (IaC scans, some Azure/GCP/K8s).

Measured (3000 findings per micro-batch):
- Wall-clock 20.8s -> 1.57s (13.2x)
- COMMITs 6003 -> 2
- SELECT FOR UPDATE on resource_tag_mappings 15000 -> 0

No schema changes. No migrations. Micro-batches are now atomic: errors
previously masked by per-finding SAVEPOINTs may surface in logs.
@AdriiiPRodri AdriiiPRodri requested a review from a team as a code owner May 20, 2026 08:34
@AdriiiPRodri AdriiiPRodri added the no-changelog Skip including change in changelog/release notes label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

🔒 Container Security Scan

Image: prowler-api:bea8fc8
Last scan: 2026-05-21 11:41:27 UTC

📊 Vulnerability Summary

Severity Count
🔴 Critical 14
Total 14

12 package(s) affected

⚠️ Action Required

Critical severity vulnerabilities detected. These should be addressed before merging:

  • Review the detailed scan results
  • Update affected packages to patched versions
  • Consider using a different base image if updates are unavailable

📋 Resources:

@AdriiiPRodri AdriiiPRodri removed the no-changelog Skip including change in changelog/release notes label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

✅ All necessary CHANGELOG.md files have been updated.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 89.01734% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.95%. Comparing base (6eebfcf) to head (e5c13b2).
⚠️ Report is 33 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11249      +/-   ##
==========================================
- Coverage   93.97%   93.95%   -0.02%     
==========================================
  Files         237      237              
  Lines       34829    34877      +48     
==========================================
+ Hits        32729    32770      +41     
- Misses       2100     2107       +7     
Flag Coverage Δ
api 93.95% <89.01%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
prowler ∅ <ø> (∅)
api 93.95% <89.01%> (-0.02%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AdriiiPRodri AdriiiPRodri force-pushed the perf/api-scan-loop-bulk-operations branch from 23e53a2 to 8623129 Compare May 20, 2026 08:51
Comment thread api/src/backend/tasks/jobs/scan.py Outdated
Comment on lines +871 to +872
for m in created_tag_mappings:
if m.pk is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In created_tag_mappings you are using ignore_conflicts=True.
That means that the database does not return the IDs of the inserted rows, so we won't enter to this If right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed with a pre-SELECT of existing (resource_id, tag_id) pairs, so updated_at is bumped only on resources that actually gain a mapping

Comment on lines +899 to +905
inserted = sum(1 for m in created_mappings if m.pk)
if inserted != len(mappings_to_create):
logger.error(
f"scan {scan_instance.id}: expected "
f"{len(mappings_to_create)} ResourceFindingMapping rows, "
f"inserted {inserted}. Rolling back micro-batch."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the other comment, if the pk is None, inserted will be 0.
We will get excpect N rows, inserted 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right: ignore_conflicts=True does not populate pk, so this branch fires on every successful batch. It's pre-existing from #10724 though, keeping it as-is to scope this PR to the perf rewrite and opening a separate fix for the silent-failure detection

@Davidm4r Davidm4r self-requested a review May 21, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants