Skip to content

feat(ASI): add behavioral trust evidence type specification#819

Open
0xbrainkid wants to merge 3 commits into
OWASP:mainfrom
0xbrainkid:feat/behavioral-trust-evidence-type
Open

feat(ASI): add behavioral trust evidence type specification#819
0xbrainkid wants to merge 3 commits into
OWASP:mainfrom
0xbrainkid:feat/behavioral-trust-evidence-type

Conversation

@0xbrainkid
Copy link
Copy Markdown

Summary

Adds a formal behavioral trust evidence type specification to the ASI agentic top 10 documentation. This addresses the gap identified in issue #802 around formalizing behavioral trust as a control class for runtime enforcement.

Per @desiorac's invitation in issue #802.

What it adds

A standardized evidence type for expressing agent behavioral trustworthiness as an input to admissibility predicates at mutation boundaries.

  • Evidence structure: trust_score, derivation, drift_status, verification
  • Trust scoring formula: success_rate × confidence(volume), scoped per task class — deterministic, auditable, no ML required
  • Three enforceability tiers:
    • Strong (on-chain atomic, ~400ms)
    • Bounded (version-anchored with configurable TTL)
    • Detectable-only (self-declared or signed without anchor)
  • Integration points: OWASP admissibility predicates, mutation boundary gate (from issue Runtime Enforcement Mapping for OWASP Agentic Top 10 (ASI01–ASI10) #802 discussion), W3C TrustProvider interface, LangGraph multi-provider trust
  • Security considerations: Sybil resistance via confidence function, temporal decay, cold-start delegation chain

Connections to existing work

Adds a formal evidence type definition for behavioral trust scoring
as a control class input to ASI01-ASI03 runtime enforcement.

Includes:
- Evidence structure (trust_score, drift_status, verification)
- Trust scoring formula: success_rate × confidence(volume), per-task-class
- Enforceability classification: strong / bounded / detectable-only
- Integration points: OWASP admissibility predicates, mutation boundary (MITRE), TrustProvider interface (W3C/LangGraph)
- Security considerations: Sybil resistance, temporal decay, cold-start

Referenced from discussion OWASP#802 (Runtime Enforcement Mapping).
brainGROWTH added 2 commits April 7, 2026 18:54
…table fingerprint

Per @desiorac's review in issue OWASP#802: baseline_version (opaque string)
was insufficient because it doesn't provide the monotonic reference
property needed for replay-verifiable proofs.

Replaced with:
- baseline_snapshot_hash: SHA-256 of canonicalized baseline (JCS)
- baseline_snapshot_ts: when baseline was computed
- Added MUST requirement: verifiers reject enforcement-mode evidence without this field
Per @desiorac review: high-volume read-only calls inflate confidence
for the whole agent, allowing write/payment operations to reach 'strong'
enforceability on borrowed trust.

Fix: gates MUST evaluate task_class-scoped evidence. cross_class_score
is optional for display but MUST NOT be used for enforcement decisions.
Enforceability tier is now task-class-specific.
Copy link
Copy Markdown

@QueBallSharken QueBallSharken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is the correct constraint.

Behavioral trust only has enforcement value if it is scoped to the operation class being authorized. High-volume low-risk activity must not inflate admissibility for write/payment/mutation classes.

So I support making this normative:

  • task_class is required for gate evaluation
  • the evaluated evidence must match the current operation type
  • cross_class_score may be retained for display or analysis, but MUST NOT be used for enforcement decisions

That keeps the trust artifact aligned to the guarded primitive and prevents cross-class borrowing of trust.

Copy link
Copy Markdown

@QueBallSharken QueBallSharken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

This closes an actual gap from #802 by turning behavioral trust into a defined evidence type instead of leaving it as an implied concept. The important part is that the enforcement path is now constrained correctly:

  • trust evidence is evaluated at the mutation boundary
  • "task_class" is required for gate evaluation
  • enforcement is bound to the current operation class
  • "cross_class_score" is not allowed to influence admissibility

That keeps trust aligned to the guarded primitive and prevents cross-class borrowing, which is the main failure mode this needed to avoid.

I also agree with the follow-up fixes:

  • "baseline_snapshot_hash" being explicitly SHA-256 of immutable baseline material
  • per-task-class gate evaluation being made normative

Those changes make the artifact more auditable, more deterministic, and more usable as a real enforcement input rather than just descriptive metadata.

Approve.

@QueBallSharken
Copy link
Copy Markdown

Clarifying the boundary

This is a useful addition, but I want to keep one distinction explicit in the record.

A behavioral trust evidence type is a control/evidence input to admissibility at a guarded boundary.

It is not by itself the same thing as the broader architecture question BBIS is trying to keep visible.

The unresolved BBIS-class question is whether the governing invariant remains live across all mutation-capable boundaries until the true irreversible mutation authority / primitive, rather than only being checked locally at one boundary.

So I would frame this as:

  • a useful enforcement input
  • compatible with stronger mutation-bound governance models
  • not a substitute for boundary-to-boundary invariant survival analysis or conformance

That distinction matters because otherwise a valid local control can get read as if it settles the larger architectural continuity question, and it does not.

@QueBallSharken
Copy link
Copy Markdown

One thing I’d still tighten in the record is the claim boundary around the enforceability tiers.

In BBIS terms, this PR now does a good job defining behavioral trust as a scoped enforcement input at a guarded boundary. What it should still say explicitly is what each tier is and is not claiming.

In particular:

  • "Strong" should be read as strong for the guarded boundary and stated enforcement context, not as a claim that the same governing invariant remained live across all mutation-capable boundaries to the true irreversible mutation authority / primitive.
  • "Bounded" should state the scope and freshness assumptions under which the trust evidence is being used.
  • "Detectable-only" should remain clearly non-preventive.

Related to that, I think the freshness rule still matters: what invalidates behavioral trust evidence, when reevaluation is required, and whether drift forces downgrade or fail-closed behavior at the boundary.

My main reason for pushing this is just to keep a useful local control from being overread as if it settles the larger BBIS-class continuity question. It does not need to do that to still be a good addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants