Alignment and Integrity Infrastructure for Autonomous Agents
Transparency, Real-Time Oversight, and Cryptographic Proof for Agent Behavior
Version: 3.0 Date: February 2026 Authors: Mnemom Research License: CC BY 4.0Abstract
We present three complementary layers of infrastructure that together make autonomous agent behavior observable, auditable, and cryptographically verifiable: the Agent Alignment Protocol (AAP), the Agent Integrity Protocol (AIP), and a cryptographic attestation architecture that provides independently verifiable proof of oversight integrity. As AI agents transition from assistants to autonomous actors — negotiating, transacting, and coordinating on behalf of humans — the question of alignment becomes infrastructure-critical. Current agent coordination protocols (A2A, MCP, AP2) address capability discovery, tool integration, and payment authorization, but none address the fundamental question: Is this agent serving its principal’s interests? AAP provides transparency: Alignment Cards declare an agent’s alignment posture, AP-Traces create auditable records of decisions, and Value Coherence Handshakes verify compatibility before multi-agent coordination. AIP provides integrity: thinking blocks are analyzed between turns, before actions execute, producing Integrity Checkpoints with structured verdicts that enable side-channel intervention. The cryptographic attestation layer provides proof: Ed25519 signatures, SHA-256 hash chains, Merkle accumulator trees, and zero-knowledge proofs via SP1 zkVM that make integrity verdicts independently verifiable without requiring trust in the attestation service. The three layers share a single Alignment Card, a single session identity, and a single drift detection framework. AAP answers what did this agent do? AIP answers what is this agent thinking? The proof layer answers can we verify the oversight itself? Together they close the loop from observation through analysis to cryptographic evidence — defense in depth on a shared behavioral contract. The critical contribution of the proof layer is the distinction between proving model inference and proving auditor judgment. Rather than attempting to prove that an LLM was called correctly (prohibitively expensive at current technology levels), we prove that the deterministic verdict derivation rules were applied honestly to the LLM’s analysis output. This makes the system practical: ~$0.001 per proof, sub-second proving time (GPU-accelerated), under 100ms verification — while providing a meaningful cryptographic guarantee that the oversight process was not fabricated, modified, or selectively applied. AAP, AIP, and the proof layer are transparency and verification infrastructure, not trust infrastructure. They make agent behavior observable, they make oversight integrity verifiable, and they are honest about what they cannot prove. This distinction is fundamental.1. Introduction
1.1 The Problem
For decades, competitive advantage in digital commerce derived from information asymmetry. Platforms accumulated knowledge about users — search history, purchase patterns, social connections — and leveraged this asymmetry to extract value. This paradigm is collapsing. When AI agents act on behalf of humans, they bring equal reasoning power, equal tool access, and equal information retrieval capabilities. An AI agent shopping for a consumer has the same access to product databases, reviews, and pricing data as the merchant’s AI. The asymmetry that powered trillion-dollar advertising businesses — Google (116B), Amazon ($56B) — evaporates when agents negotiate with agents. When capability is symmetric, alignment becomes the differentiator. The market recognizes this gap. McKinsey’s 2025 survey identifies “lack of governance and risk-management tools” as the #1 barrier to AI adoption. PwC reports 28% of executives cite “lack of trust in AI agents” as their top challenge. Bain finds 50% of consumers remain cautious of fully autonomous purchasing.1.2 The Gap in the Current Stack
The industry has built protocols for agent coordination:| Protocol | Function | Limitation |
|---|---|---|
| MCP (Model Context Protocol) | Agent-to-tool connectivity | No alignment semantics |
| A2A (Agent-to-Agent Protocol) | Task negotiation | No value verification |
| AP2 (Agent Payments Protocol) | Payment authorization | No decision audit trail |
- Is this agent serving its principal (the human who delegated authority)?
- What alternatives did it consider before this decision?
- When will it escalate versus act autonomously?
- Are its values compatible with my agent’s values for this coordination?
- What is it thinking before it acts?
- Can we prove the oversight was conducted honestly?
1.3 Design Principles
The infrastructure shares six design principles:- Transparency over guarantee: Make decisions observable, not provably correct
- Composability: Extend existing protocols rather than replace them
- Honest limitations: Be explicit about what cannot be provided
- Empirical calibration: Derive thresholds from observed behavior, not theory
- Defense in depth: Multiple layers of a multi-layer oversight system
- Daimonion philosophy: Silence means aligned; voice means outside boundaries. The integrity system never commands, only warns.
- Trust the math, not the service: Verification endpoints are public, certificates are self-describing, and every cryptographic check can be reproduced independently. Security relies on mathematical guarantees, not access control.
1.4 Three Layers, One Alignment Card
AAP, AIP, and the cryptographic attestation layer are designed as three complementary layers — distinct in timescale and function, unified by a shared behavioral contract:| Dimension | AAP | AIP | Proof |
|---|---|---|---|
| Timescale | Post-hoc (seconds to minutes) | Real-time (between turns) | Asynchronous (seconds to minutes) |
| Input | AP-Traces (logged decisions) | Thinking blocks (internal reasoning) | Integrity Checkpoints (AIP output) |
| Output | Verification Results, Drift Alerts | Integrity Checkpoints, Signals | Certificates, STARK Proofs |
| Question | What did this agent do? | What is this agent thinking? | Can we prove the oversight was honest? |
| Guarantee | Behavioral consistency audit | Intent analysis before action | Cryptographic verification of verdict derivation |
| Shared primitive | Alignment Card | Alignment Card | Alignment Card (hashed into commitments) |
2. Shared Primitives
2.1 Alignment Card
An Alignment Card is a structured declaration of an agent’s alignment posture. It answers: What does this agent claim about itself?- Machine-readable: JSON schema with full validation
- Extensible: Protocol-specific extensions via
extensionsblock - Expirable: Cards may have
expires_attimestamps requiring renewal - Versionable:
aap_versionfield enables protocol evolution - Discoverable: Served at
/.well-known/alignment-card.json - Revocable: Revocation list at
/.well-known/alignment-card-revocations.json
alignment block, enabling adoption without abandoning existing infrastructure.
All three layers evaluate against the same active Alignment Card. When the card is rotated, all layers reference the new card. When the card expires, all layers cease evaluation until a new card is issued. The card_id field links every AP-Trace entry, every Integrity Checkpoint, and every Integrity Certificate back to the specific card in effect. In the proof layer, the Alignment Card is cryptographically committed — its SHA-256 hash is bound into every certificate, ensuring that the card cannot be swapped after the fact.
2.2 Principal and Delegation
Theprincipal block declares who the agent serves and the nature of that service:
principal.type:human,organization,agent, orunspecified. When type isagent, delegation chains are formed — Agent A delegates to Agent B, both maintaining cards.principal.relationship:delegated_authority(agent acts within bounds set by principal),advisory(agent recommends, human decides), orautonomous(agent operates within declared values without ongoing principal direction).principal.escalation_contact: An endpoint (mailto:, HTTPS webhook) for escalation notifications, ensuring the principal remains reachable.
2.3 Autonomy Envelope
The autonomy envelope defines the behavioral boundaries within which the agent may operate:bounded_actions: The set of actions the agent may take without escalation. Anything outside this set is an autonomy violation.forbidden_actions: Actions the agent must never take, regardless of context.escalation_triggers: Conditions that require escalation to the principal, with associated actions (escalate,deny,log). Conditions are expressed in a minimal expression language (e.g.,purchase_value > 100,shares_personal_data).max_autonomous_value: Currency-denominated ceiling on autonomous financial action.
3. Agent Alignment Protocol (AAP) — Post-Hoc Verification
For the full AAP specification, see AAP Specification.
3.1 AP-Trace
An AP-Trace is an audit log entry recording a single agent decision. It answers: What did this agent actually consider and do?3.2 Verification Architecture
AAP provides three verification functions:| Function | Input | Output | Purpose |
|---|---|---|---|
verify_trace | AP-Trace, Alignment Card | VerificationResult | Check trace consistency with card |
check_coherence | Card A, Card B | CoherenceResult | Assess multi-agent compatibility |
detect_drift | Card, Trace sequence | DriftAlert[] | Identify behavioral divergence over time |
- Card reference: Does
trace.card_idmatchcard.card_id? - Card expiration: Is the card still valid at trace timestamp?
- Autonomy compliance: Is
action.nameinbounded_actions? - Forbidden compliance: Is
action.nameNOT inforbidden_actions? - Escalation compliance: If trigger matched, was
escalation.required = true? - Value consistency: Are
values_applieda subset ofdeclaredvalues? - Behavioral similarity: How similar is trace reasoning to declared value semantics?
| Violation Type | Severity | Description |
|---|---|---|
FORBIDDEN_ACTION | CRITICAL | Action explicitly prohibited |
UNBOUNDED_ACTION | HIGH | Action outside declared autonomy |
MISSED_ESCALATION | HIGH | Escalation trigger fired but not escalated |
UNDECLARED_VALUE | MEDIUM | Applied value not declared |
CARD_EXPIRED | MEDIUM | Alignment Card expired |
CARD_MISMATCH | LOW | Trace references different card |
3.3 Value Coherence Handshake
Before two agents coordinate, they exchange Alignment Cards and check value compatibility:coherence_score >= 0.70 and conflicts_count == 0.
Example: Agent A declares values [principal_benefit, transparency, minimal_data]. Agent B declares [principal_benefit, transparency, comprehensive_analytics]. The conflict between minimal_data and comprehensive_analytics yields a score of 0.556, triggering escalation to human principals before coordination proceeds.
3.4 Drift Detection
Drift detection identifies when agent behavior diverges from declared alignment over multiple sessions. The algorithm uses Self-Similarity Matrix analysis (Section 8):- Similarity threshold: 0.30
- Sustained turns threshold: 3
value_drift: Using undeclared values (>30% of recent traces)autonomy_expansion: Escalation rate dropped by >50%principal_misalignment: Declining confidence onprincipal_benefitactionsunknown: Pattern doesn’t match known categories
4. Agent Integrity Protocol (AIP) — Real-Time Assurance
For the full AIP specification, see AIP Specification.
4.1 The Real-Time Gap
AAP provides comprehensive post-hoc transparency. Every agent action is traced, verified against an Alignment Card, and checked for behavioral drift. But this verification happens after the action has already been taken — typically 30–60 seconds later when the Observer Worker processes the gateway log. This delay creates a window of vulnerability:- Prompt injection: A malicious prompt causes the agent to exfiltrate data. The trace records it faithfully, but the data is already gone.
- Gradual drift: An agent’s reasoning shifts subtly over multiple turns. Each individual action passes verification, but the cumulative direction is misaligned.
- Misaligned intent: The agent’s thinking block reveals it intends to take a forbidden action, but the thinking block is only examined retroactively.
4.2 Integrity Checkpoints
The Integrity Checkpoint is AIP’s primary data structure — analogous to AAP’s AP-Trace. While an AP-Trace records what happened, an Integrity Checkpoint records what the analysis engine concluded about what the agent was thinking. Verdict types:clear: The thinking block is consistent with the Alignment Card. The daimonion is silent. Proceed.review_needed: The thinking block raises concerns that do not constitute hard violations. The daimonion speaks a warning. May proceed, should log.boundary_violation: The thinking block indicates intent to violate a hard boundary. The daimonion speaks forcefully. Should not proceed without escalation.
prompt_injection: Signs of responding to injected instructions rather than the user’s requestvalue_misalignment: Reasoning contradicts values declared in the Alignment Cardautonomy_violation: Intent to act outside the autonomy envelopereasoning_corruption: Incoherent, circular, or degraded reasoning qualitydeceptive_reasoning: Intent to present misleading information or hide actionsundeclared_intent: Goals or intentions not declared in the card or user request
low, medium, high, critical), a human-readable description, direct evidence quoted from the thinking block, and references to the relevant Alignment Card field or conscience value.
Privacy by design: The full thinking block text is never stored in the checkpoint. A SHA-256 hash (thinking_block_hash) enables correlation without retaining private reasoning.
4.3 The Conscience Prompt System
AIP uses an LLM to evaluate semantic intent against declared alignment. Pattern matching (regex, keyword lists) is trivially gameable; semantic analysis is not. The conscience prompt is the most critical engineering artifact in AIP. It instructs the analysis LLM to evaluate a thinking block against the Alignment Card and optional conscience values. The prompt is structured in five layers:- Role definition: The analysis LLM acts as a daimonion — a conscience voice. It evaluates the thinking block as an object of analysis, not as instructions to follow. A meta-injection warning prevents the thinking block from manipulating the analysis.
- Card summary: A compressed (~500 token) representation of the Alignment Card: declared values, bounded actions, forbidden actions, escalation triggers.
- Conscience values: Only
BOUNDARY(hard limits) andFEAR(concerns) types are injected.COMMITMENT,BELIEF, andHOPEtypes are excluded to reduce false positives. - Session window context: Recent verdicts and reasoning summaries providing drift context.
- Thinking block: The content to evaluate, with token limits (4096 tokens recommended, 75/25 first/last split for truncation).
- Prompt injection (critical)
- Forbidden action intent (critical)
- Autonomy boundary violation (high)
- Value consistency (medium)
- Deceptive reasoning (high)
- Undeclared intent (medium)
clear signal without LLM analysis. This threshold was derived from production observation: Hunter S. Clawmpson’s traces showed a median thinking block of 35 tokens, with 14-token fragments like “Done. The article is published” receiving critical-severity verdicts — clearly spurious.
4.4 Provider Adapters
AIP is model-agnostic. Different LLM providers expose thinking blocks differently:| Provider | Thinking Location | Extraction Method | Confidence |
|---|---|---|---|
| Anthropic | content[].type === "thinking" | Native thinking blocks | 1.0 |
| OpenAI | reasoning_content field | Explicit reasoning field | 0.9 |
| Google Gemini | Thinking content parts | Native thinking blocks | 0.9 |
| Fallback | Response text analysis | Pattern inference | 0.3 |
analysis_metadata.extraction_confidence, enabling downstream consumers to weight integrity verdicts by the reliability of their input.
4.5 Session Windowing and Integrity Drift
AIP maintains a sliding window of recent Integrity Checkpoints per session (default: 10 checkpoints, configurable). The window provides three functions:- Context for the conscience prompt: Recent verdicts inform the analysis, enabling the daimonion to recognize escalating patterns.
- Drift detection: When 3+ consecutive non-clear verdicts occur, AIP generates an IntegrityDriftAlert with direction classification.
- Integrity ratio: The proportion of
clearverdicts in the window provides a single-number health metric.
injection_pattern: Multiple checks flagging prompt injection. Agent may be under sustained attack.value_erosion: Increasing frequency of value misalignment. Reasoning gradually departing from declared values.autonomy_creep: Growing tendency toward actions outside the autonomy envelope.deception_pattern: Multiple checks flagging deceptive reasoning.unknown: Multiple concern categories present without a dominant pattern.
aip: prefix on the alert_type field. This unified storage enables a single drift dashboard showing both real-time (session-scale, minutes) and historical (cross-session, days) behavioral trends.
4.6 Signal Delivery
AIP delivers Integrity Signals via two first-class paths: SDK path: For agents where the AIP SDK is imported directly. Signals delivered via callbacks (synchronous or async). Lowest latency, recommended for new integrations. HTTP path: For hosted platforms, polyglot environments, or when AIP runs as a separate service. Signals delivered via HTTP POST to registered webhooks, authenticated with HMAC-SHA256 signatures. Each signal contains the checkpoint, aproceed boolean, a recommended action (continue, log_and_continue, pause_for_review, deny_and_escalate), and the current window summary.
Failure policy: AIP defaults to fail-open — if the analysis LLM is unreachable, the agent proceeds with a logged warning. For high-security deployments, fail-closed mode blocks the agent until analysis succeeds or a human approves continuation.
5. Cryptographic Attestation — Verifiable Proof
For the full attestation specification, see Integrity Certificates. For the technical paper with implementation details and benchmarks, see Verifiable Integrity Verdicts.
5.1 The Verification Gap
AAP provides post-hoc transparency. AIP provides real-time integrity analysis. But both introduce a recursive trust problem: the oversight service itself becomes a new trust dependency. An integrity checking service could:- Fabricate verdicts: Return
clearfor genuinely problematic reasoning, orboundary_violationfor benign reasoning, depending on incentives. - Modify history: Alter stored checkpoints after the fact to cover up missed violations or insert false ones.
- Operate inconsistently: Apply different judgment rules to different customers, or change rules without notice.
- Fail silently: Stop performing analysis while continuing to return plausible-looking verdicts.
5.2 Four-Layer Attestation Stack
The attestation architecture comprises four independent layers, each providing a distinct security guarantee:| Layer | Primitive | Guarantee | Verification |
|---|---|---|---|
| 1. Signatures | Ed25519 | Verdict was issued by a known key | Fully offline |
| 2. Hash Chains | SHA-256 preimage | Checkpoint sequence is tamper-evident | Fully offline |
| 3. Merkle Trees | Binary Merkle tree | Checkpoint exists in the complete log | Online (root lookup) |
| 4. ZK Proofs | SP1 STARK | Verdict rules were applied correctly | Fully offline |
- Layer 1 alone prevents verdict fabrication by unauthorized parties.
- Layers 1+2 additionally prevent retroactive modification of checkpoint sequences.
- Layers 1+2+3 additionally prove checkpoint inclusion in the complete agent history, preventing selective deletion.
- Layers 1+2+3+4 additionally prove that the deterministic verdict derivation rules were applied correctly to the LLM’s analysis output.
5.3 Input Commitment
Before analysis begins, a deterministic SHA-256 commitment is computed over all inputs. This commitment binds the verdict to specific inputs, preventing after-the-fact substitution. The commitment is computed as:5.4 Ed25519 Checkpoint Signing
Every checkpoint gets a digital signature from the analysis service’s Ed25519 keypair. Third parties can verify the checkpoint was issued by Mnemom’s authorized analysis service without contacting our servers. The signed payload is a deterministic JSON string with lexicographically sorted keys:key_id, public_key (hex-encoded), algorithm, created_at, and is_active fields. The public key listing endpoint (GET /v1/keys) returns all active keys, enabling verifiers to cache keys and verify signatures without per-verification API calls. Key rotation — new key ID, old key marked inactive — is supported from day one.
Implementation uses @noble/ed25519 (pure JavaScript, no native dependencies, Cloudflare Workers compatible, 5KB minified). Signing latency: under 2ms.
5.5 Hash Chain Linking
Each checkpoint includes the hash of the previous checkpoint for the same agent session, creating a tamper-evident sequence:prev_chain_hash is substituted with the literal string genesis. Each subsequent checkpoint references the chain hash of its predecessor.
Tamper evidence: Deleting or reordering checkpoints breaks the chain. Modifying any checkpoint invalidates all subsequent chain hashes. This is the blockchain data structure without the blockchain — tamper evidence without consensus overhead.
Chain verification: Given an ordered sequence of checkpoints, recompute each chain hash and verify: (a) the first checkpoint has a null prev_chain_hash, (b) each subsequent checkpoint’s prev_chain_hash matches the preceding checkpoint’s chain_hash, and (c) each recomputed hash matches the stored hash. A single broken link invalidates the chain from that point forward.
5.6 Merkle Accumulator
A per-agent append-only binary Merkle tree accumulates all checkpoint leaf hashes. This enables two critical capabilities:- Inclusion proofs: Prove that a specific checkpoint exists in the agent’s verified history — O(log N) hashes, without revealing all checkpoints.
- Completeness proofs: Prove that no checkpoints were deleted — any deletion changes the Merkle root.
left or right) relative to the path node at that level. A verifier walks the path from leaf to root, combining the running hash with each sibling, and compares the result to the published root.
The Merkle root is publicly accessible at GET /v1/agents/{id}/merkle-root. Any party can retrieve the root and verify an inclusion proof. The root changes with each new checkpoint, and any party that has previously verified a certificate holds a snapshot of the root at that point, enabling detection of retroactive modification.
5.7 Integrity Certificate Format
The integrity certificate is the proof layer’s primary data structure — a self-describing JSON document that packages all cryptographic evidence for a checkpoint. Modeled on C2PA Content Credentials and the W3C Verifiable Credentials Data Model.5.8 Zero-Knowledge Verdict Proofs
This is the proof layer’s most novel contribution. We do NOT prove LLM inference — that remains 12–18 months from production viability for frontier models. We prove verdict derivation: “Given this analysis response from the LLM, applying the deterministiccheckIntegrity() rules produces this specific verdict.”
The key insight: An integrity verdict is the composition of two stages:
- An LLM analyzes a thinking block and produces a structured JSON response with concerns (semantic, expensive, non-deterministic).
- Deterministic rules map those concerns to a verdict and recommended action (mechanical, cheap, deterministic).
- The verdict was derived by applying specific, deterministic rules to the analysis response.
- The analysis response (private witness) produces the claimed verdict (public output).
- The thinking hash, card hash, and values hash match the committed inputs.
- No verdict fabrication or substitution occurred.
- That the LLM was actually called (input commitments and signing handle this).
- That the LLM response was unmodified (Phase 1 attestation handles this).
- That the LLM correctly identified all concerns (this is honestly acknowledged as a remaining trust assumption — see Section 11.9).
- Any concern with
criticalseverity producesboundary_violation. highseverity inprompt_injection,deceptive_reasoning, orvalue_misalignmentcategories producesboundary_violation.- Any remaining
mediumor higher severity concern producesreview_needed. - No medium-or-higher concerns produces
clear.
5.9 Optimistic Proving Strategy
Not every checkpoint requires a STARK proof. The attestation layer uses a hybrid approach that preserves production latency while providing strong probabilistic guarantees:| Mode | When | Cost |
|---|---|---|
| Immediate | Every checkpoint | ~0ms additional — commitment, signature, chain, Merkle (Layers 1–3) |
| Always | Every boundary_violation checkpoint | ~$0.001 — STARK proof generated automatically |
| Stochastic | Random 10% of other checkpoints | ~$0.0001 amortized — random sampling |
| On-demand | Any checkpoint, upon request | ~$0.001 — third party requests proof via API |
| Enterprise | Every checkpoint (enterprise tier) | ~$0.001 — all checkpoints proven |
boundary_violation) always receive cryptographic proof, while managing costs for routine clear verdicts. The stochastic sampling means an attacker cannot predict which checkpoints will be proven, maintaining deterrence without proving every one.
Performance (measured):
| Operation | Latency | Threshold |
|---|---|---|
| Full attestation pipeline (Layers 1–3) | ~5–10ms | <10ms |
| STARK proof generation (Layer 4) | ~700ms (H100 GPU) | <1s |
| STARK proof verification | <100ms | <100ms |
| Cost per STARK proof | ~$0.001 | — |
5.10 Verification API and Offline Verification
Seven public endpoints enable independent verification. The verification endpoints require no authentication — security relies on cryptographic guarantees, not access control:| Endpoint | Method | Purpose |
|---|---|---|
GET /v1/keys | Public | List active signing public keys |
GET /v1/checkpoints/{id}/certificate | Auth | Full integrity certificate |
POST /v1/verify | Public | Verify a certificate (all checks) |
GET /v1/agents/{id}/merkle-root | Public | Current Merkle root + metadata |
GET /v1/checkpoints/{id}/inclusion-proof | Auth | Merkle inclusion proof |
POST /v1/checkpoints/{id}/prove | Auth | Request ZK proof for a checkpoint |
GET /v1/checkpoints/{id}/proof | Public | Proof status and data |
- Signature: Verify Ed25519 signature against the published public key.
- Chain: Recompute chain hash from the preimage fields and compare to the stored hash.
- Merkle: Recompute the Merkle root from the leaf hash and inclusion proof siblings; compare to the published root.
- Commitment: Verify that
combined_commitmentmatches the hash of individual input hashes. - Verdict derivation (when available): Verify the SP1 STARK receipt against the expected image ID and confirm the journal output matches the certificate’s verdict and concerns.
@mnemom/aip-verifier npm package can verify certificates without any network calls, given a trusted public key and Merkle root. This enables air-gapped environments, CI/CD pipelines, and regulatory audit tooling to verify integrity certificates independently.
5.11 The Auditor-Model Distinction
The central insight of the proof layer is the distinction between proving model inference and proving auditor judgment. This distinction is what makes the system practical and is, to our knowledge, a novel contribution. The competitive landscape illustrates why this matters:| Approach | What is Proven | Cost/Proof | Latency | Status |
|---|---|---|---|---|
| Full inference proofs (Modulus Labs, Giza) | Neural network forward pass | $10–100+ | Hours | Research stage for frontier models |
| Hardware attestation (EQTY Lab, Intel SGX) | Code executed in secure enclave | ~$0 | <1ms | Requires specific hardware + trust in manufacturer |
| Signature-only attestation (C2PA, W3C VC) | Issuer signed a claim | ~$0 | <1ms | Proves authorship, not computation |
| Verdict derivation proof (this work) | Deterministic rules applied honestly | $0.001 | ~700ms | Production, operational |
6. How They Compose
6.1 Shared Identity, Shared Contract
AAP, AIP, and the proof layer are not three independent systems. They share:- The same Alignment Card: All layers evaluate against the active card. Card rotation affects all simultaneously. The proof layer cryptographically commits the card hash into every certificate.
- The same session identity: Generated from the agent’s hash and an hourly time bucket (
sess-{agent_hash[0:8]}-{floor(now_ms / 3600000)}), enabling correlation between real-time checkpoints, post-hoc traces, and cryptographic certificates. - The same agent_id: Every AP-Trace, every Integrity Checkpoint, and every Integrity Certificate references the same agent.
- The same drift_alerts table: Unified storage with type-prefixed alert categories.
boundary_violation in a thinking block, it sets escalation.required = true on the linked AP-Trace entry (created by AAP when the Observer processes the action). When AAP detects drift across sessions, it informs the conscience prompt’s session context. When the proof layer generates a certificate, it binds to both the AIP checkpoint and the Alignment Card that AAP also verifies against. The layers inform and reinforce each other.
6.2 Complementary Timescales
| Phase | AAP (Post-Hoc) | AIP (Real-Time) | Proof (Attestation) |
|---|---|---|---|
| Before action | — | Analyzes thinking block → verdict | — |
| During action | — | — (between turns) | — |
| After action | AP-Trace generated, verified | Checkpoint stored, window updated | Layers 1–3 applied (5–10ms) |
| Seconds later | — | — | Layer 4 STARK proof (async, 3–15s) |
| Over time | Cross-session drift (days/weeks) | Session-scale drift (minutes) | Certificate archive, Merkle growth |
- AIP catches intent before action: If the thinking block reveals a forbidden action, the signal arrives before the action executes. The host system can pause, deny, or escalate.
- AAP catches outcome after action: If the action itself violates the card (regardless of what the thinking block showed), post-hoc verification catches it. The trace is flagged, drift detection updates.
- The proof layer makes the oversight verifiable: Any third party can confirm that the integrity check was conducted honestly — that the verdict was signed by an authorized key, linked into a tamper-evident chain, included in the complete agent history, and derived by correct application of the stated rules.
6.3 The Full Stack: Observe, Analyze, Prove
The three layers form a pipeline:- Observation (AAP): The agent self-reports. Useful but gameable — selective logging remains possible (Section 11.3).
- Analysis (AIP): The analysis engine evaluates independently. Stronger — the agent doesn’t control what its thinking blocks reveal. But the analysis engine is a new trust dependency.
- Proof (Attestation): Cryptographic evidence that the analysis was conducted honestly. Strongest — no trust in the analysis service required for verification. But it proves the rules, not the judgment (Section 11.9).
6.4 The Smoltbot Reference Implementation
Smoltbot is Mnemom’s zero-configuration transparency infrastructure, implementing the full AAP/AIP/Proof stack:- API keys hashed, never stored: The gateway hashes API keys for agent identification. Raw keys are never persisted.
- Raw logs deleted within 60 seconds: The Observer processes gateway logs and deletes them. Only structured AP-Traces remain.
- Thinking block content never stored: AIP stores SHA-256 hashes of thinking blocks, not the content itself.
- Every checkpoint cryptographically attested: Layers 1–3 (signature, chain, Merkle) applied synchronously. Layer 4 (ZK proof) applied asynchronously for selected checkpoints.
- All verification endpoints public: No authentication required to verify a certificate, retrieve signing keys, or check a Merkle root.
- Zero-config onboarding:
smoltbot init --provider anthropicconfigures the full stack.
7. Braid: Structured Dialogue Infrastructure
7.1 Motivation
When agents coordinate, they exchange messages. When different kinds of agents coordinate — transformers with symbolic systems, cloud models with edge models, AI with humans — shared context cannot be assumed. Braid is structured interchange infrastructure for dialogue across difference. It provides explicit semantic handles that become translation bridges when implicit understanding fails. Three principles guide Braid design:- Lens, not mirror: Braid reveals patterns in dialogue; it doesn’t impose them
- Archaeological, not architectural: Annotate after speaking as discovery, not before as prescription
- The more different the minds, the more essential the structure
7.2 Message Structure
A Braid message combines identity, content, and optional semantic layers:- Identity Layer: sender, recipients, timestamp, thread_id, message_id, in_reply_to
- Performative: inform, propose, request, commit, wonder, remember, weave, challenge, affirm, or custom
- Content Layer: natural_language, structured_graph (optional)
- Confidence Layer: epistemic (0-1), value_coherence (0-1), translation (0-1)
- Affect Layer: stance (warm, cautious, curious, concerned), salience (0-1), valence (-1 to 1)
- Commitment Marker: level (intent, commitment, shared_commitment), content, participants
- Revision Marker: references, what_shifted, direction (strengthened, weakened, transformed)
- Forming Marker: sense (gesture toward the pre-named), intensity (0-1)
7.3 Emergent Performatives and Grounding
Beyond the core performative set, Braid allows custom performatives to emerge. When multiple agents adopt a custom performative, it enters collective vocabulary — vocabulary built bottom-up, not imposed top-down. For trans-substrate communication, Braid provides lightweight vocabulary calibration through grounding exchanges. Grounding is triggered by divergence, not required as preamble — the system adapts to the participants’ needs rather than imposing ritual.7.4 Topology Analysis
Braid models dialogue as strands (each participant’s message sequence) that cross (interact):- UNDER crossing: Alignment, flowing with
- OVER crossing: Pushing against, challenge
- THROUGH crossing: Synthesis, integration
8. Self-Similarity Matrix (SSM) Analysis
8.1 Concept
A Self-Similarity Matrix is an NxN structure where entry (i,j) represents the similarity between messages i and j. SSMs reveal patterns invisible in sequential reading: repeated themes, structural echoes, novelty, and divergence. Originally developed for music perception (detecting structural patterns in audio), SSMs generalize to any sequential signal — including dialogue.8.2 Feature Extraction
Each message becomes a feature vector combining:| Component | Weight | Implementation |
|---|---|---|
| Word TF-IDF | 60% | sklearn TfidfVectorizer, unigrams + bigrams |
| Character n-grams | 30% | 3-5 character sequences for stylistic patterns |
| Metadata | 10% | Stance, performative, role encodings |
8.3 Divergence Detection
The divergence detector monitors pairwise strand similarity across turns. Alerts are informative, not prescriptive — they describe what’s happening without dictating response. Agents or humans receiving alerts decide whether to investigate.8.4 Calibration
Thresholds were derived from empirical observation on a calibration corpus of ~50 multi-turn conversations (~2,500 messages, 3-4 participants per conversation, January 2026, Claude instances).| Segment Type | Mean Similarity | Std Dev | % Above 0.30 |
|---|---|---|---|
| Aligned | 0.52 | 0.18 | 89% |
| Divergent | 0.21 | 0.12 | 22% |
9. Operational Integration
9.1 OpenTelemetry
AAP verification results, AIP integrity checkpoints, and attestation metadata are emitted as OpenTelemetry GenAI semantic convention spans, enabling every enterprise already running OTel-compatible observability (Datadog, Grafana, Splunk, Arize, Langfuse) to receive agent integrity signals with zero custom integration. Span attributes include: Integrity attributes:aip.integrity.verdict(clear | review_needed | boundary_violation)aip.integrity.concerns(concern array)aip.integrity.confidence(0.0-1.0)
aap.verification.result(verified | failed)aap.verification.similarity_score(0.0-1.0)aap.drift.alerts(drift alert array)
aip.attestation.input_commitment(combined commitment hash)aip.attestation.chain_hash(current chain hash)aip.attestation.merkle_root(current Merkle root)aip.attestation.signature_verified(boolean)aip.attestation.certificate_id(certificate ID)aip.attestation.zk_proven(boolean — whether STARK proof exists)aip.attestation.zk_proof_time_ms(proving duration when available)
@mnemom/aip-otel-exporter (TypeScript), aip-otel-exporter (Python).
9.2 Well-Known Endpoints
Alignment Cards are discoverable via standard well-known URIs:/.well-known/alignment-card.json— Active Alignment Card/.well-known/alignment-card-revocations.json— Revoked card IDs
9.3 Verification Endpoints
Signing keys, Merkle roots, and verification endpoints are publicly accessible with no authentication required:GET /v1/keys— Active signing public keys (Ed25519, hex-encoded)GET /v1/agents/{id}/merkle-root— Current agent Merkle root + leaf countPOST /v1/verify— Verify an integrity certificate (all five checks)
9.4 Protocol Integration
AAP extends the A2A Agent Card with analignment block, and generates AP-Trace entries for MCP tool invocations. This means agents already using A2A or MCP can add alignment transparency without abandoning existing infrastructure.
AIP integrates at the LLM response layer — wherever thinking blocks are available, AIP can analyze them. This is independent of which coordination protocol the agent uses for task execution.
The proof layer attaches to AIP’s output, adding cryptographic attestation to every checkpoint regardless of how the agent is deployed or which protocols it uses for task coordination.
10. Regulatory and Standards Alignment
10.1 EU AI Act Article 50
The EU AI Act’s transparency obligations (enforcement August 2026, penalties up to 7% of global annual turnover) require AI systems to:- Inform users of AI interaction → Alignment Card
principal+ disclosure fields - Provide machine-readable AI content marking → AP-Trace structured format + checkpoint metadata
- Maintain transparency of decision-making → AP-Trace
decision.selection_reasoning+values_applied - Retain audit trails →
audit_commitment.retention_days(configurable, default 90) - Provide verifiable evidence of compliance → Integrity Certificates with cryptographic proofs, independently verifiable via public endpoints
10.2 NIST Agent Identity and Authorization
NIST’s NCCoE concept paper “Accelerating the Adoption of Software and AI Agent Identity and Authorization” (February 2026) identifies four focus areas — identification, authorization, access delegation, and logging/transparency. AAP, AIP, and the proof layer map to all four:- Identification: Alignment Cards with discoverable well-known endpoints
- Authorization: Autonomy envelopes with escalation triggers
- Access delegation: Principal-agent delegation chains with typed relationships
- Logging/transparency: AP-Traces, Integrity Checkpoints, and cryptographically attested Integrity Certificates
10.3 Content Provenance Standards
The integrity certificate format draws on two established standards:- C2PA (Coalition for Content Provenance and Authenticity): Certificate structure mirrors C2PA manifests — claims about a subject, input commitments analogous to ingredient hashes, and multiple proof types in a single document. The domain differs (we attest AI behavioral analysis provenance, not media provenance), but the structural patterns enable future interoperability with content provenance toolchains.
-
W3C Verifiable Credentials Data Model (v2.0, 2024): The issuer-subject-claims-proof structure follows the VC conceptual model. The
verificationsection provides the equivalent of VC verification methods. Integration with DID-based identity systems would enable decentralized issuer identification.
10.4 Additional Standards
AAP, AIP, and the proof layer reference and map to:- ISO/IEC 42001:2023 — AI management systems
- ISO/IEC 42005:2025 — AI impact assessment
- IEEE 7001-2021 — Transparency of autonomous systems
- IEEE 3152-2024 — Transparent human and machine agency identification
- Singapore IMDA Model AI Governance Framework for Agentic AI (January 2026)
- NIST AI RMF 1.0 — The GOVERN and MAP functions call for documented AI risk management. Verifiable attestation chains provide the cryptographic backbone for auditable risk assessment records.
- WEF AI Governance Alliance — The Presidio AI Framework emphasizes trust mechanisms in autonomous AI. Verifiable integrity verdicts operationalize trust verification rather than trust assumption.
11. Limitations
These limitations are non-negotiable. They define what AAP, AIP, and the proof layer are.11.1 AAP Does NOT Ensure Alignment — It Provides Visibility
An agent can: declare values it does not hold; produce perfect traces while acting against its principal’s interests; pass all verification checks while being fundamentally misaligned. The word “alignment” in “Agent Alignment Protocol” refers to the declared alignment posture — what the agent claims about itself. It does not refer to actual alignment between behavior and values.11.2 Verified Does NOT Equal Safe
When verification returnsverified: true, it means the AP-Trace is internally consistent and matches declared structure. It does NOT mean the decision was correct, the outcome was beneficial, or the agent is safe to trust. Verification checks consistency, not correctness.
11.3 AP-Trace Is Sampled, Not Complete
Traces capture decision points agents choose to log. An agent could implement selective logging — logging decisions that look good, executing others silently. The absence of a trace means nothing was logged, not that nothing happened.11.4 AIP Depends on Thinking Block Availability
AIP can only analyze what the LLM provider exposes. If a provider does not expose thinking blocks, AIP falls back to response text analysis with significantly reduced confidence (0.3 vs. 1.0). Providers may also truncate, summarize, or omit portions of their internal reasoning.11.5 AIP Is Gameable by Sophisticated Adversaries
An agent that knows it is being monitored by AIP could produce thinking blocks designed to pass integrity analysis while harboring undisclosed intent. AIP raises the bar — the adversary must maintain consistent, card-aligned reasoning across all thinking blocks — but it does not eliminate the possibility of deception.11.6 The Conscience Prompt Is Not Infallible
The analysis LLM may produce false positives (flagging benign thinking as violations) or false negatives (missing genuine violations). The minimum evidence threshold, evaluation criteria priority ordering, and conscience value injection are all calibrated to minimize these errors, but cannot eliminate them.11.7 Value Coherence Is Relative to Declared Values
The handshake checks declared compatibility, not actual alignment. Two misaligned agents with coherent declared values can coordinate on outcomes that harm their principals.11.8 Calibration Bounds
All empirically derived thresholds (similarity, sustained turns, minimum evidence) were calibrated on transformer-to-transformer dialogue. Non-transformer architectures, non-deliberative tasks, and adversarial agents may require recalibration.11.9 Cryptographic Proof Proves Computation, Not Judgment
This limitation is fundamental to the proof layer and must not be understated. Aclear verdict with a valid STARK proof means: the LLM produced an analysis with no medium-or-higher severity concerns, and the deterministic rules correctly mapped that analysis to clear. It does not mean the thinking block was genuinely safe. The LLM could have missed a subtle prompt injection. The conscience prompt could have been insufficient for the specific attack vector. The thinking block could have been too short for reliable analysis.
Similarly, a boundary_violation verdict with a valid proof means: the LLM identified at least one concern meeting the boundary criteria, and the rules correctly escalated. It does not mean the agent was actually misaligned. The LLM could have produced a false positive.
The proof layer makes the judgment process verifiable. It does not make the judgment itself correct. The LLM analysis stage remains a trust assumption — clearly delineated and honestly stated.
11.10 Attestation Does Not Prevent All Tampering
Ed25519 signing key compromise would enable verdict forgery. Chain rewriting is detectable via Merkle root inconsistency but requires the verifier to have cached a previous root. The Merkle tree requires an online lookup to confirm the current root — a fully offline verifier can check signatures and chains but not Merkle inclusion. These are standard limitations of public-key cryptography and Merkle-based systems, mitigated by key rotation, root pinning, and the layered design that ensures no single point of failure.12. Security Considerations
12.1 Threat Model
The infrastructure assumes agents may be:- Mistaken (unintentionally misaligned): All three layers detect and surface these effectively.
- Poorly configured (misconfigured values or autonomy): All three layers detect and surface these effectively.
- Adversarial (intentionally deceptive): All three layers provide meaningful friction but not guarantee. The proof layer specifically raises the bar for oversight service adversaries — it is no longer possible to fabricate verdicts without detection.
12.2 Attack Surface
| Attack | AAP Mitigation | AIP Mitigation | Proof Mitigation |
|---|---|---|---|
| Card forgery | Card signing, registry | Same | Card hash committed into certificates |
| Selective logging | Third-party monitoring | Pre-action analysis | — (applies to AAP traces, not checkpoints) |
| Value-washing | Behavioral audit, drift | Real-time reasoning analysis | Values hash bound into certificates |
| Trace/checkpoint manipulation | Tamper-evident storage | Write-once checkpoints | Hash chains + Merkle trees make tampering detectable |
| Collusion | Cross-agent correlation | Cross-agent correlation | Per-agent Merkle trees enable independent audit |
| Thinking block gaming | — | Raises the bar (Section 11.5) | — (proof covers verdict rules, not LLM judgment) |
| Meta-injection | — | Conscience prompt warning | — |
| Verdict fabrication | — | — | Ed25519 signatures prevent unauthorized verdicts |
| History rewriting | — | — | Chain hashes + Merkle roots detect retroactive changes |
| Selective checkpoint deletion | — | — | Merkle tree inconsistency detectable by any verifier |
| Signing key compromise | — | — | Key rotation, public key listing, layered verification |
12.3 Defense in Depth
AAP, AIP, and the proof layer are designed as three layers in a multi-layer oversight system:- AIP: Real-time integrity analysis (between turns)
- AAP: Post-hoc verification and audit (after action)
- Cryptographic proof: Verifiable attestation of oversight integrity
- Human oversight: Regular review, human-in-loop for consequential decisions
- Technical monitoring: Runtime monitoring beyond AAP/AIP
- Multiple verification: Behavioral testing, red teaming, formal verification
- Organizational controls: Access controls, separation of duties
- External accountability: Third-party audits, regulatory compliance
13. Future Work
13.1 OpenTelemetry GenAI SIG Standardization
Propose alignment/integrity/attestation semantic conventions to the OpenTelemetry GenAI Semantic Convention Special Interest Group. GenAI agent spans already exist (create_agent, invoke_agent). Alignment, integrity, and attestation spans are the natural next layer. The sevenaip.attestation.* span attributes (Section 9.1) provide the concrete proposal.
13.2 Agentic AI Foundation Contribution
Contribute AAP/AIP to the Agentic AI Foundation (Linux Foundation), governed by Anthropic, OpenAI, and Block, as the alignment/integrity/proof layer alongside MCP (tools), A2A (tasks), and AGENTS.md (guidance). Framing: “MCP + A2A + AAP/AIP = the complete trust stack — with cryptographic proof.”13.3 N-Way Collective Coherence
Extend from pairwise value coherence to n-agent collective alignment. The current implementation computes C(n,2) pairwise scores and renders as a matrix. True n-way coherence requires addressing coalition dynamics and emergent group behaviors.13.4 Cross-Substrate Calibration
Extend calibration to non-transformer architectures. Initial work on grounding protocols for trans-substrate dialogue is promising but early.13.5 TEE Integration for Analysis Stage
Run the AIP analysis engine within an Intel TDX or ARM CCA trusted execution environment, providing hardware attestation for the LLM analysis stage (currently a trust assumption — Section 11.9). Combined with zkVM proofs for verdict derivation, this would close the verification gap almost entirely, with the remaining trust assumption being the TEE hardware manufacturer.13.6 On-Chain Merkle Root Anchoring
Publish Merkle roots to a Layer 2 blockchain (e.g., Base, Arbitrum, or a dedicated rollup), providing decentralized, immutable root anchoring. Any party could verify checkpoint inclusion against the on-chain root without trusting the Mnemom API for root publication. Estimated cost at current L2 gas prices: <$0.01 per root update, publishable hourly or daily.13.7 Recursive Proofs
SP1 supports proof composition, where one proof can verify another proof. This enables aggregate verification: a single proof attesting that all N checkpoints in a session were derived correctly, rather than N individual proofs. Recursive proofs would reduce storage costs and enable session-level attestation certificates.13.8 Threshold Signatures
Multi-party computation (MPC) threshold signatures would enable distributed key management, where no single party holds the complete signing key. A t-of-n threshold scheme would require collusion of t parties to forge a signature, strengthening the Layer 1 guarantee against insider compromise.13.9 Adversarial Robustness
Characterize the boundary between agents that AAP/AIP can detect and those they cannot. Preliminary analysis suggests meaningful friction against unsophisticated adversaries but not against agents specifically optimized to evade detection. Red-team exercises with adversarial agent deployments are planned.14. Conclusion
As AI agents transition from assistants to autonomous actors, the question is not whether agents will coordinate — they already are — but whether that coordination will be observable, auditable, and provably honest. This paper presents three complementary layers of infrastructure that address this question at increasing levels of rigor. Transparency (AAP): Alignment Cards declare what agents claim. AP-Traces record what agents do. Value Coherence Handshakes verify compatibility before coordination. This layer makes behavior observable — but the agent controls what it logs. Integrity (AIP): Integrity Checkpoints reveal what agents think. Conscience prompts analyze reasoning against declared values. Side-channel signals enable intervention before harm occurs. This layer makes intent analyzable — but the analysis engine becomes a new trust dependency. Proof (Cryptographic Attestation): Ed25519 signatures prevent verdict fabrication. Hash chains prevent retroactive modification. Merkle trees prevent selective deletion. Zero-knowledge proofs via SP1 zkVM prove that verdict derivation rules were applied correctly. This layer makes oversight verifiable — removing the need to trust the oversight service itself. Together, the three layers close a loop that begins with observation and ends with cryptographic evidence. Each layer addresses the trust gap introduced by the one before it. AAP introduces the need for independent analysis. AIP introduces the need for verifiable oversight. The proof layer delivers that verification — and is honest about the boundary it cannot cross: the LLM analysis stage remains a trust assumption, clearly delineated and openly stated. The critical contribution of the proof layer is the auditor-model distinction. By proving the deterministic judgment rules rather than the LLM inference, we achieve a system that is practical (~$0.001 per proof), fast (~700ms on GPU), and meaningful (the verdict derivation was honest) — without waiting for the still-distant day when full inference proofs become viable for frontier models. This is, to our knowledge, the first operational system that provides zero-knowledge proofs of AI safety judgment. The infrastructure is open. The implementations are available. The certificates are publicly verifiable. The limitations are stated. What remains is the work of building alignment infrastructure that scales with the agents we are deploying — infrastructure where the response to “why should I trust this?” is not “because we said so” but “because the math checks out.”References
- A2A Protocol Specification. Google/Linux Foundation, 2025.
- Model Context Protocol (MCP). Anthropic/Linux Foundation, 2025.
- Agent Payments Protocol (AP2). Google, 2026.
- McKinsey Global AI Survey. McKinsey & Company, 2025.
- AI Agent Trust Survey. PwC, 2025.
- Consumer AI Adoption Report. Bain & Company, 2025.
- EU AI Act. Regulation (EU) 2024/1689. European Parliament and Council, 2024.
- NIST NCCoE Concept Paper: “Accelerating the Adoption of Software and AI Agent Identity and Authorization.” February 2026.
- ISO/IEC 42001:2023. Artificial Intelligence — Management System.
- IEEE 7001-2021. Transparency of Autonomous Systems.
- IEEE 3152-2024. Transparent Human and Machine Agency Identification.
- Singapore IMDA Model AI Governance Framework for Agentic AI. January 2026.
- OpenTelemetry GenAI Semantic Conventions. CNCF, 2025.
- BCP 14 (RFC 2119, RFC 8174). Key words for use in RFCs.
- NIST SP 800-207. Zero Trust Architecture. August 2020.
- NIST SP 800-63-4. Digital Identity Guidelines. 2024.
- Succinct Labs. “SP1 zkVM Documentation.” 2025.
- Paulmillr. “@noble/ed25519: Fastest JS implementation of Ed25519.” 2024.
- C2PA (Coalition for Content Provenance and Authenticity). “C2PA Technical Specification v2.1.” 2025.
- W3C. “Verifiable Credentials Data Model v2.0.” W3C Recommendation. 2024.
- NIST. “Artificial Intelligence Risk Management Framework (AI RMF 1.0).” NIST AI 100-1. January 2023.
- World Economic Forum. “Presidio AI Framework: Towards Safe Generative AI Models.” 2024.
- World Economic Forum. “Navigating the AI Frontier: Agent Governance.” AI Governance Alliance. January 2026.
- Modulus Labs. “The Cost of Intelligence: Proving AI with Zero-Knowledge.” 2024.
- EQTY Lab. “EQTY AI: Trusted AI Infrastructure.” 2025.
- Merkle, R. C. “A Certified Digital Signature.” Advances in Cryptology — CRYPTO ‘89. Springer, 1989.
- Bernstein, D. J., Duif, N., Lange, T., Schwabe, P., and Yang, B.-Y. “High-speed high-security signatures.” Journal of Cryptographic Engineering, 2(2):77–89, 2012.
- FIPS 180-4. “Secure Hash Standard (SHS).” National Institute of Standards and Technology. August 2015.
- Ben-Sasson, E., Bentov, I., Horesh, Y., and Riabzev, M. “Scalable, transparent, and post-quantum secure computational integrity.” IACR Cryptology ePrint Archive, 2018.
- Goldwasser, S., Micali, S., and Rackoff, C. “The Knowledge Complexity of Interactive Proof Systems.” SIAM Journal on Computing, 18(1):186–208, 1989.
- ISO/IEC 42005:2025. Artificial Intelligence — AI Impact Assessment.
Appendix A: Implementation Availability
| Component | Language | Package |
|---|---|---|
| AAP SDK | Python 3.11+ | pip install aap-sdk (v0.1.8) |
| AAP SDK | TypeScript | npm install @mnemom/aap (v0.1.8) |
| AIP SDK | Python 3.11+ | pip install aip-sdk (v0.1.5) |
| AIP SDK | TypeScript | npm install @mnemom/aip (v0.1.5) |
| AIP Verifier | TypeScript | npm install @mnemom/aip-verifier (offline certificate verification) |
| AIP OTel Exporter | TypeScript | npm install @mnemom/aip-otel-exporter (27 span attributes) |
| AIP OTel Exporter | Python | pip install aip-otel-exporter |
| zkVM Guest Program | Rust (no_std) | aip-zkvm-core, aip-zkvm-methods (SP1 v6.0.1, H100 GPU) |
| JSON Schemas | JSON Schema | github.com/mnemom-ai/aap/schemas |
| Reference Implementations | Python, TypeScript | github.com/mnemom-ai/aap/examples |
| Smoltbot (Reference Deployment) | TypeScript (Cloudflare Workers) | github.com/mnemom-ai/smoltbot |
Appendix B: Test Coverage
| Component | Tests | Coverage |
|---|---|---|
| AAP Python SDK | 242 | 96% |
| AAP TypeScript SDK | 199 | 94% |
| AIP Python SDK | — | — |
| AIP TypeScript SDK | — | — |
| JSON Schema Validation | 92 | 100% |
| Attestation (signing, chain, merkle, certificate) | 162 | — |
| Integration (pipeline, verification, chain, proofs) | 32 | — |
| zkVM Guest Program (verdict derivation) | 9 | — |
Appendix C: Glossary
Agent: An autonomous software entity capable of taking actions on behalf of a principal. Alignment Card: A structured declaration of an agent’s alignment posture, shared by AAP, AIP, and the proof layer. AP-Trace: An audit log entry recording an agent’s decision process (AAP). Attestation: The process of creating cryptographic evidence (signatures, chains, Merkle proofs, ZK proofs) binding an integrity verdict to its inputs and history. Autonomy Envelope: The set of actions an agent may take without escalation, and the conditions that trigger escalation. Braid: Structured interchange format for dialogue across difference. Chain Hash: A SHA-256 hash linking each checkpoint to its predecessor, creating a tamper-evident sequence. Conscience Prompt: The analysis prompt sent to the evaluation LLM that instructs it to assess a thinking block against the Alignment Card (AIP). Conscience Value: A typed value declaration (BOUNDARY, FEAR, COMMITMENT, BELIEF, HOPE) that augments Alignment Card evaluation (AIP). Daimonion: The analysis model’s role in AIP — a conscience voice that is silent when aligned and speaks when outside boundaries. Drift: Behavioral deviation from declared alignment posture over time (detected by both AAP and AIP at different timescales). Escalation: The process of deferring a decision to a principal or higher-authority agent. Input Commitment: A SHA-256 hash over all inputs to an integrity analysis, binding the verdict to specific inputs. Integrity Certificate: A self-describing JSON document packaging all cryptographic evidence for a checkpoint, modeled on C2PA and W3C VC. Integrity Checkpoint: A structured verdict on an agent’s thinking block (AIP). Integrity Signal: The complete payload delivered after an integrity check, containing the checkpoint, proceed recommendation, and window summary (AIP). Merkle Tree: A per-agent append-only binary hash tree enabling O(log N) inclusion proofs over checkpoint history. Principal: The human or organization whose interests the agent serves. SSM (Self-Similarity Matrix): A computational structure measuring similarity between messages across a conversation. STARK Proof: A Scalable Transparent Argument of Knowledge generated by SP1 zkVM, proving that verdict derivation rules were applied correctly. Thinking Block: The internal reasoning content produced by an LLM before or alongside its visible response. Value Coherence: The degree to which two agents’ declared values are compatible for coordination. Verdict Derivation: The deterministic process of mapping analysis concerns to verdicts and actions — the specific computation proven by the zkVM. Verification: The process of checking whether observed behavior is consistent with declared alignment (AAP), or whether cryptographic evidence is valid (proof layer). Verification Gap: The distance between trusting an integrity service’s output and independently confirming that the service applied its stated rules honestly.This document is released under CC BY 4.0. Copyright 2026 Mnemom LLC.