feat(ASI): add behavioral trust evidence type specification#819
feat(ASI): add behavioral trust evidence type specification#8190xbrainkid wants to merge 3 commits into
Conversation
Adds a formal evidence type definition for behavioral trust scoring as a control class input to ASI01-ASI03 runtime enforcement. Includes: - Evidence structure (trust_score, drift_status, verification) - Trust scoring formula: success_rate × confidence(volume), per-task-class - Enforceability classification: strong / bounded / detectable-only - Integration points: OWASP admissibility predicates, mutation boundary (MITRE), TrustProvider interface (W3C/LangGraph) - Security considerations: Sybil resistance, temporal decay, cold-start Referenced from discussion OWASP#802 (Runtime Enforcement Mapping).
…table fingerprint Per @desiorac's review in issue OWASP#802: baseline_version (opaque string) was insufficient because it doesn't provide the monotonic reference property needed for replay-verifiable proofs. Replaced with: - baseline_snapshot_hash: SHA-256 of canonicalized baseline (JCS) - baseline_snapshot_ts: when baseline was computed - Added MUST requirement: verifiers reject enforcement-mode evidence without this field
Per @desiorac review: high-volume read-only calls inflate confidence for the whole agent, allowing write/payment operations to reach 'strong' enforceability on borrowed trust. Fix: gates MUST evaluate task_class-scoped evidence. cross_class_score is optional for display but MUST NOT be used for enforcement decisions. Enforceability tier is now task-class-specific.
QueBallSharken
left a comment
There was a problem hiding this comment.
Agreed. This is the correct constraint.
Behavioral trust only has enforcement value if it is scoped to the operation class being authorized. High-volume low-risk activity must not inflate admissibility for write/payment/mutation classes.
So I support making this normative:
task_classis required for gate evaluation- the evaluated evidence must match the current operation type
cross_class_scoremay be retained for display or analysis, but MUST NOT be used for enforcement decisions
That keeps the trust artifact aligned to the guarded primitive and prevents cross-class borrowing of trust.
QueBallSharken
left a comment
There was a problem hiding this comment.
Approved.
This closes an actual gap from #802 by turning behavioral trust into a defined evidence type instead of leaving it as an implied concept. The important part is that the enforcement path is now constrained correctly:
- trust evidence is evaluated at the mutation boundary
- "task_class" is required for gate evaluation
- enforcement is bound to the current operation class
- "cross_class_score" is not allowed to influence admissibility
That keeps trust aligned to the guarded primitive and prevents cross-class borrowing, which is the main failure mode this needed to avoid.
I also agree with the follow-up fixes:
- "baseline_snapshot_hash" being explicitly SHA-256 of immutable baseline material
- per-task-class gate evaluation being made normative
Those changes make the artifact more auditable, more deterministic, and more usable as a real enforcement input rather than just descriptive metadata.
Approve.
Clarifying the boundaryThis is a useful addition, but I want to keep one distinction explicit in the record. A behavioral trust evidence type is a control/evidence input to admissibility at a guarded boundary. It is not by itself the same thing as the broader architecture question BBIS is trying to keep visible. The unresolved BBIS-class question is whether the governing invariant remains live across all mutation-capable boundaries until the true irreversible mutation authority / primitive, rather than only being checked locally at one boundary. So I would frame this as:
That distinction matters because otherwise a valid local control can get read as if it settles the larger architectural continuity question, and it does not. |
|
One thing I’d still tighten in the record is the claim boundary around the enforceability tiers. In BBIS terms, this PR now does a good job defining behavioral trust as a scoped enforcement input at a guarded boundary. What it should still say explicitly is what each tier is and is not claiming. In particular:
Related to that, I think the freshness rule still matters: what invalidates behavioral trust evidence, when reevaluation is required, and whether drift forces downgrade or fail-closed behavior at the boundary. My main reason for pushing this is just to keep a useful local control from being overread as if it settles the larger BBIS-class continuity question. It does not need to do that to still be a good addition. |
Summary
Adds a formal behavioral trust evidence type specification to the ASI agentic top 10 documentation. This addresses the gap identified in issue #802 around formalizing behavioral trust as a control class for runtime enforcement.
Per @desiorac's invitation in issue #802.
What it adds
A standardized evidence type for expressing agent behavioral trustworthiness as an input to admissibility predicates at mutation boundaries.
success_rate × confidence(volume), scoped per task class — deterministic, auditable, no ML requiredConnections to existing work