perf(api): reduce DB load in scan hot loop by 13x by AdriiiPRodri · Pull Request #11249 · prowler-cloud/prowler

AdriiiPRodri · 2026-05-20T08:34:36Z

Context

The per-finding loop in tasks/jobs/scan.py was the dominant bottleneck of perform_prowler_scan: for each finding, it issued multiple SELECT FOR UPDATE against resource_tag_mappings, opened a transaction per item, and re-fetched Resource/ResourceTag rows that were already known. For large scans this produced thousands of round-trips and lock contention, slowing the hot loop and increasing DB load.

This PR rewrites the micro-batch path to be set-oriented and atomic, with no schema changes.

Description

Changes in api/src/backend/tasks/jobs/scan.py:

Pre-resolve Resource and ResourceTag rows in bulk before the per-finding loop, instead of per item.
Replace Resource.upsert_or_delete_tags (which issued SELECT FOR UPDATE per mapping) with deferred ResourceTagMapping.bulk_create(ignore_conflicts=True) executed once at the end of the batch.
Wrap the entire micro-batch in a single rls_transaction (was 2N). Deadlock retry now operates at the batch level.
Populate Finding.resource_regions, resource_services and resource_types directly on INSERT, removing the post-INSERT bulk_update pass.
Raise SCAN_DB_BATCH_SIZE from 500 to 1000.
Add update_fields=[...] to Scan / Provider saves to avoid full-row writes.
Throttle progress saves to either a 1% delta or a 10s interval (whichever comes first).
Preserve findings with empty resource_uid (IaC scans, some Azure/GCP/K8s findings).

No schema changes. No migrations. Behaviorally, the micro-batch is now atomic: errors that were previously masked by per-finding SAVEPOINTs may now surface in logs (the batch is retried on deadlock).

Measured impact (3000 findings per micro-batch)

Metric	Before	After	Delta
Wall-clock	20.8s	1.57s	13.2x faster
`COMMIT` count	6003	2	-99.97%
`SELECT FOR UPDATE` on `resource_tag_mappings`	15000	0	-100%

Steps to review

Read api/src/backend/tasks/jobs/scan.py end-to-end; the change is concentrated in a single file.
Confirm the bulk pre-resolution step covers all Resource / ResourceTag lookups previously done inside the loop.
Verify that the deferred ResourceTagMapping.bulk_create(ignore_conflicts=True) correctly replaces the previous per-mapping upsert path (idempotent on retry).
Check the new transaction boundary: one rls_transaction per micro-batch instead of per finding; confirm the deadlock retry path still re-runs the whole batch safely.
Confirm that empty resource_uid findings (IaC, some Azure/GCP/K8s) are kept and stored.
Run the API test suite focused on scan and reports tasks.

Checklist

Community Checklist

This feature/issue is listed in here or roadmap.prowler.com
Is it assigned to me, if not, request it via the issue/feature in here or Prowler Community Slack

Review if the code is being covered by tests.
Review if code is being documented following this specification https://github.com/google/styleguide/blob/gh-pages/pyguide.md#38-comments-and-docstrings
Review if backport is needed.
Review if is needed to change the Readme.md
Ensure new entries are added to CHANGELOG.md, if applicable.

SDK/CLI

Are there new checks included in this PR? No

API

All issue/task requirements work as expected on the API
Endpoint response output (if applicable)
EXPLAIN ANALYZE output for new/modified queries or indexes (if applicable)
Performance test results (if applicable)
Any other relevant evidence of the implementation (if applicable)
Verify if API specs need to be regenerated.
Check if version updates are required (e.g., specs, uv, etc.).
Ensure new entries are added to CHANGELOG.md, if applicable.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

- Pre-resolve Resources and ResourceTags in bulk before the per-finding loop. - Replace `Resource.upsert_or_delete_tags` with deferred `ResourceTagMapping.bulk_create(ignore_conflicts=True)` at end of batch (eliminates the per-mapping `SELECT FOR UPDATE`). - Wrap the entire micro-batch in a single `rls_transaction` (was 2N); deadlock retry now per-batch. - Populate `Finding.resource_regions/services/types` on INSERT, dropping the post-INSERT `bulk_update`. - Raise `SCAN_DB_BATCH_SIZE` from 500 to 1000. - Add `update_fields=[...]` to `Scan`/`Provider` saves; throttle progress saves to 1% delta or 10s. - Preserve findings with empty `resource_uid` (IaC scans, some Azure/GCP/K8s). Measured (3000 findings per micro-batch): - Wall-clock 20.8s -> 1.57s (13.2x) - COMMITs 6003 -> 2 - SELECT FOR UPDATE on resource_tag_mappings 15000 -> 0 No schema changes. No migrations. Micro-batches are now atomic: errors previously masked by per-finding SAVEPOINTs may surface in logs.

…bulk-operations

github-actions · 2026-05-20T08:35:08Z

✅ Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

github-actions · 2026-05-20T08:37:49Z

🔒 Container Security Scan

Image: prowler-api:bea8fc8
Last scan: 2026-05-21 11:41:27 UTC

📊 Vulnerability Summary

Severity	Count
🔴 Critical	14
Total	14

12 package(s) affected

⚠️ Action Required

Critical severity vulnerabilities detected. These should be addressed before merging:

Review the detailed scan results
Update affected packages to patched versions
Consider using a different base image if updates are unavailable

📋 Resources:

Download full report (see artifacts)
View in Security tab
Scanned with Trivy

github-actions · 2026-05-20T08:38:25Z

✅ All necessary CHANGELOG.md files have been updated.

codecov · 2026-05-20T08:50:48Z

Codecov Report

❌ Patch coverage is 89.01734% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.95%. Comparing base (6eebfcf) to head (e5c13b2).
⚠️ Report is 33 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11249      +/-   ##
==========================================
- Coverage   93.97%   93.95%   -0.02%     
==========================================
  Files         237      237              
  Lines       34829    34877      +48     
==========================================
+ Hits        32729    32770      +41     
- Misses       2100     2107       +7

Flag	Coverage Δ
api	`93.95% <89.01%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
prowler	`∅ <ø> (∅)`
api	`93.95% <89.01%> (-0.02%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Davidm4r · 2026-05-21T11:09:36Z

+                    for m in created_tag_mappings:
+                        if m.pk is not None:


In created_tag_mappings you are using ignore_conflicts=True.
That means that the database does not return the IDs of the inserted rows, so we won't enter to this If right?

Good catch. Fixed with a pre-SELECT of existing (resource_id, tag_id) pairs, so updated_at is bumped only on resources that actually gain a mapping

Davidm4r · 2026-05-21T11:10:42Z

+                    inserted = sum(1 for m in created_mappings if m.pk)
+                    if inserted != len(mappings_to_create):
+                        logger.error(
+                            f"scan {scan_instance.id}: expected "
+                            f"{len(mappings_to_create)} ResourceFindingMapping rows, "
+                            f"inserted {inserted}. Rolling back micro-batch."
+                        )


Because the other comment, if the pk is None, inserted will be 0.
We will get excpect N rows, inserted 0.

You're right: ignore_conflicts=True does not populate pk, so this branch fires on every successful batch. It's pre-existing from #10724 though, keeping it as-is to scope this PR to the perf rewrite and opening a separate fix for the silent-failure detection

AdriiiPRodri added 2 commits May 15, 2026 13:51

Merge remote-tracking branch 'origin/master' into perf/api-scan-loop-…

66895fb

…bulk-operations

AdriiiPRodri requested a review from a team as a code owner May 20, 2026 08:34

AdriiiPRodri added the no-changelog Skip including change in changelog/release notes label May 20, 2026

github-actions Bot added the component/api label May 20, 2026

docs(api): changelog entry for scan hot-loop perf in 1.29.0

1ad8863

AdriiiPRodri removed the no-changelog Skip including change in changelog/release notes label May 20, 2026

style(api): drop decorative banners and tidy comments in scan hot loop

8623129

AdriiiPRodri force-pushed the perf/api-scan-loop-bulk-operations branch from 23e53a2 to 8623129 Compare May 20, 2026 08:51

AdriiiPRodri added 2 commits May 20, 2026 11:03

Merge branch 'master' into perf/api-scan-loop-bulk-operations

1fa493e

chore: remove comment

ce4dd8a

Davidm4r reviewed May 21, 2026

View reviewed changes

fix(api): bump updated_at on resources that gain tag mappings

e5c13b2

Davidm4r approved these changes May 21, 2026

View reviewed changes

Davidm4r self-requested a review May 21, 2026 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(api): reduce DB load in scan hot loop by 13x#11249

perf(api): reduce DB load in scan hot loop by 13x#11249
AdriiiPRodri wants to merge 7 commits into
masterfrom
perf/api-scan-loop-bulk-operations

AdriiiPRodri commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Davidm4r May 21, 2026

Uh oh!

AdriiiPRodri May 21, 2026

Uh oh!

Davidm4r May 21, 2026

Uh oh!

AdriiiPRodri May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AdriiiPRodri commented May 20, 2026

Context

Description

Measured impact (3000 findings per micro-batch)

Steps to review

Checklist

SDK/CLI

API

License

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Container Security Scan

📊 Vulnerability Summary

⚠️ Action Required

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Davidm4r May 21, 2026

Choose a reason for hiding this comment

Uh oh!

AdriiiPRodri May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Davidm4r May 21, 2026

Choose a reason for hiding this comment

Uh oh!

AdriiiPRodri May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

codecov Bot commented May 20, 2026 •

edited

Loading