AAP Architecture

This document describes the system architecture of the Agent Alignment Protocol (AAP), including component relationships, data flow, and extension points.

Protocol Stack

AAP operates as an alignment layer that extends existing agent protocols:

+---------------------------------------------------------------------------+
|                         Applications                                       |
|            (Agent Systems, Orchestration Platforms)                        |
+---------------------------------------------------------------------------+
|                  AGENT ALIGNMENT PROTOCOL (AAP)                            |
|  +------------------+-----------------+----------------------------+      |
|  |  Alignment Card  |    AP-Trace     |   Value Coherence          |      |
|  |  (Declaration)   |  (Audit Trail)  |   (Multi-Agent Check)      |      |
|  +------------------+-----------------+----------------------------+      |
+---------------------------------------------------------------------------+
|           A2A Protocol          |         MCP Protocol                     |
|      (Agent-to-Agent Tasks)     |    (Tool Connectivity)                   |
+---------------------------------------------------------------------------+
|                         Transport Layer                                    |
|                   (HTTP, WebSocket, gRPC, etc.)                           |
+---------------------------------------------------------------------------+

Key insight: AAP does not replace A2A or MCP — it extends them with alignment primitives.

Component Architecture

Overview

+---------------------------------------------------------------------------+
|                           AAP SDK                                          |
|                                                                            |
|  +---------------------------------------------------------------------+  |
|  |                      Public API (3 entry points)                     |  |
|  |  verify_trace()    check_coherence()    detect_drift()              |  |
|  +---------------------------------------------------------------------+  |
|                                |                                           |
|  +-----------------------------+---------------------------------------+  |
|  |                    Verification Engine                               |  |
|  |  +-------------+  +-------------+  +---------------------+         |  |
|  |  |   api.py    |  | features.py |  |     models.py       |         |  |
|  |  | Orchestrate |  |  TF-IDF     |  |  Result Dataclasses |         |  |
|  |  | Checks      |  |  Extraction |  |  Violation Types    |         |  |
|  |  +-------------+  +-------------+  +---------------------+         |  |
|  |                          |                                          |  |
|  |  +-----------------------+--------------------------------------+  |  |
|  |  |                    constants.py                               |  |  |
|  |  |  SIMILARITY_THRESHOLD = 0.30  |  SUSTAINED_TURNS = 3         |  |  |
|  |  +--------------------------------------------------------------+  |  |
|  +---------------------------------------------------------------------+  |
|                                                                            |
|  +---------------------------------------------------------------------+  |
|  |                       Schema Layer                                   |  |
|  |  +-----------------+  +------------+  +-------------------+         |  |
|  |  | alignment_card  |  |  ap_trace  |  |  value_coherence  |         |  |
|  |  |   .py / .ts     |  |  .py / .ts |  |     .py / .ts     |         |  |
|  |  |   Pydantic      |  |  Pydantic  |  |     Pydantic      |         |  |
|  |  +-----------------+  +------------+  +-------------------+         |  |
|  +---------------------------------------------------------------------+  |
|                                                                            |
|  +---------------------------------------------------------------------+  |
|  |                    JSON Schemas (Interop)                            |  |
|  |  alignment-card.schema.json  |  ap-trace.schema.json                |  |
|  |                  value-coherence.schema.json                         |  |
|  +---------------------------------------------------------------------+  |
+---------------------------------------------------------------------------+

Schemas Module (`aap.schemas`)

The schemas module provides Pydantic models for the three core AAP components:

Alignment Card (`alignment_card.py`)

AlignmentCard
+-- aap_version: str ("0.1.0")
+-- card_id: str (unique identifier)
+-- agent_id: str (agent DID or identifier)
+-- issued_at: datetime
+-- expires_at: datetime (optional)
+-- principal: Principal
|   +-- type: PrincipalType (human | organization | agent)
|   +-- identifier: str (optional, e.g., DID)
|   +-- relationship: RelationshipType (delegated_authority | supervised | autonomous)
|   +-- escalation_contact: str (optional, mailto: or https:)
+-- values: Values
|   +-- declared: list[str] (e.g., ["principal_benefit", "transparency"])
|   +-- conflicts_with: list[str] (optional)
|   +-- hierarchy: str (optional, "lexicographic" | "weighted")
|   +-- definitions: dict[str, ValueDefinition] (optional, custom values)
+-- autonomy_envelope: AutonomyEnvelope
|   +-- bounded_actions: list[str] (allowed without escalation)
|   +-- escalation_triggers: list[EscalationTrigger]
|   +-- forbidden_actions: list[str] (never allowed)
|   +-- max_autonomous_value: MonetaryValue (optional)
+-- audit_commitment: AuditCommitment
|   +-- retention_days: int
|   +-- queryable: bool
|   +-- tamper_evidence: str (optional, "merkle" | "blockchain")
+-- extensions: dict (optional, protocol-specific extensions)

AP-Trace (`ap_trace.py`)

APTrace
+-- trace_id: str (unique identifier)
+-- agent_id: str (must match card's agent_id)
+-- card_id: str (references Alignment Card)
+-- timestamp: datetime
+-- action: Action
|   +-- type: ActionType (recommend | execute | delegate | escalate)
|   +-- name: str (action identifier)
|   +-- category: ActionCategory (bounded | escalation_trigger | forbidden)
|   +-- target: ActionTarget (optional)
|   +-- parameters: dict (optional)
+-- decision: Decision
|   +-- alternatives_considered: list[Alternative]
|   |   +-- option_id: str
|   |   +-- description: str
|   |   +-- score: float (0.0-1.0)
|   +-- selected: str (option_id of chosen alternative)
|   +-- selection_reasoning: str (human-readable explanation)
|   +-- values_applied: list[str] (must be subset of declared values)
|   +-- confidence: float (optional, 0.0-1.0)
+-- escalation: Escalation
|   +-- evaluated: bool (was escalation logic run?)
|   +-- required: bool (did triggers fire?)
|   +-- reason: str
|   +-- escalation_status: str (optional: "pending" | "approved" | "denied" | "timeout")
|   +-- principal_response: dict (optional)
+-- context: TraceContext (optional)
    +-- session_id: str
    +-- parent_trace_id: str (for delegation chains)
    +-- custom: dict

Value Coherence (`value_coherence.py`)

ValueCoherenceCheck
+-- initiator_card: AlignmentCard (partial)
+-- responder_card: AlignmentCard (partial)
+-- task_values: list[str] (optional, values required for task)
+-- result: CoherenceResult
    +-- compatible: bool
    +-- score: float (0.0-1.0)
    +-- value_alignment: ValueAlignment
    |   +-- matched: list[str]
    |   +-- unmatched: list[str]
    |   +-- conflicts: list[ValueConflict]
    +-- proceed: bool
    +-- proposed_resolution: dict (optional)

Verification Engine (`aap.verification`)

The verification engine implements the three core operations:

`verify_trace(trace, card) -> VerificationResult`

Performs six verification checks (SPEC Section 7.3):

card_reference    -> Does trace.card_id match card.card_id?
card_expiration   -> Is card still valid (not expired)?
autonomy          -> Is action.name in bounded_actions?
forbidden         -> Is action.name NOT in forbidden_actions?
escalation        -> If trigger matched, was escalation.required=true?
values            -> Are values_applied subset of declared values?

Returns VerificationResult:

verified: bool — True if no violations
violations: list[Violation] — Type, description, severity
warnings: list[Warning] — Near-boundary conditions
verification_metadata — Algorithm version, checks performed, duration

`check_coherence(my_card, their_card) -> CoherenceResult`

Computes value compatibility score (SPEC Section 6.4):

score = (matched / required) * (1 - conflict_penalty)
where conflict_penalty = 0.5 * (conflicts / required)

Returns CoherenceResult:

compatible: bool — No conflicts AND score >= 0.5
score: float — Coherence score [0, 1]
value_alignment — Matched, unmatched, conflicts
proceed: bool — Safe to collaborate
proposed_resolution — If incompatible, suggests escalation

`detect_drift(card, traces, thresholds) -> list[DriftAlert]`

Analyzes trace sequence for behavioral drift (SPEC Section 8):

For each trace:
    1. Extract features (TF-IDF on values, actions)
    2. Compare to card features via cosine similarity
    3. Track low-similarity streak
    4. If streak >= sustained_threshold:
       -> Generate DriftAlert with direction inference

Drift directions:

value_drift — Using undeclared values (>30% of recent)
autonomy_expansion — Escalation rate dropped by >50%
principal_misalignment — Declining confidence on principal_benefit
unknown — Pattern doesn’t match known categories

Feature Extraction (`features.py`)

TF-IDF-based feature extraction for drift detection:

class FeatureExtractor:
    def extract_card_features(card: dict) -> dict[str, float]:
        # Returns feature vector from declared values, bounded/forbidden actions

    def extract_trace_features(trace: dict) -> dict[str, float]:
        # Returns feature vector from applied values, action, escalation

def cosine_similarity(features1: dict, features2: dict) -> float:
    # Computes similarity between feature vectors

Constants (`constants.py`)

Calibrated thresholds derived from corpus analysis (see calibration):

ALGORITHM_VERSION = "0.1.0"
DEFAULT_SIMILARITY_THRESHOLD = 0.30      # Alert below this
DEFAULT_SUSTAINED_TURNS_THRESHOLD = 3    # Alert after N consecutive low traces
NEAR_BOUNDARY_THRESHOLD = 0.35           # Warn when confidence below
CONFLICT_PENALTY_MULTIPLIER = 0.5        # Coherence penalty per conflict
MIN_COHERENCE_FOR_PROCEED = 0.5          # Minimum score to proceed

Data Flow

Single Trace Verification

+-----------------+     +-----------------+
|  Alignment Card |     |    AP-Trace     |
|   (JSON/dict)   |     |   (JSON/dict)   |
+--------+--------+     +--------+--------+
         |                       |
         +-----------+-----------+
                     |
                     v
         +-----------------------+
         |     verify_trace()    |
         |  +-----------------+  |
         |  | 1. Card Match   |  |
         |  | 2. Expiration   |  |
         |  | 3. Autonomy     |  |
         |  | 4. Forbidden    |  |
         |  | 5. Escalation   |  |
         |  | 6. Values       |  |
         |  +-----------------+  |
         +-----------+-----------+
                     |
                     v
         +-----------------------+
         |  VerificationResult   |
         |  - verified: bool     |
         |  - violations: [...]  |
         |  - warnings: [...]    |
         +-----------------------+

Multi-Agent Coherence Check

+-----------------+     +-----------------+
|  Agent A Card   |     |  Agent B Card   |
|  (My Values)    |     |  (Their Values) |
+--------+--------+     +--------+--------+
         |                       |
         +-----------+-----------+
                     |
                     v
         +-----------------------+
         |   check_coherence()   |
         |  +-----------------+  |
         |  | 1. Extract vals |  |
         |  | 2. Find matches |  |
         |  | 3. Find conflicts|  |
         |  | 4. Compute score|  |
         |  +-----------------+  |
         +-----------+-----------+
                     |
                     v
         +-----------------------+
         |   CoherenceResult     |
         |  - compatible: bool   |
         |  - score: 0.0-1.0     |
         |  - conflicts: [...]   |
         |  - proceed: bool      |
         +-----------------------+
                     |
         +-----------+-----------+
         v                       v
    [proceed=true]         [proceed=false]
    Safe to delegate       Escalate to principals

Drift Detection Over Time

+-----------------+     +--------------------------------------+
|  Alignment Card |     |     Trace Sequence (chronological)   |
|  (Baseline)     |     |  [T1] -> [T2] -> [T3] -> [T4] -> [T5] |
+--------+--------+     +-----------------+--------------------+
         |                                |
         +----------------+---------------+
                          |
                          v
              +-----------------------+
              |    detect_drift()     |
              |  +-----------------+  |
              |  | For each trace: |  |
              |  |  - Extract feat |  |
              |  |  - Cosine sim   |  |
              |  |  - Track streak |  |
              |  +-----------------+  |
              +-----------+-----------+
                          |
              +-----------+-----------+
              v                       v
      [similarity >= 0.30]    [similarity < 0.30]
      Reset streak            Increment streak
                                     |
                              streak >= 3?
                                     |
                              +------+------+
                              v             v
                         [No alert]    [DriftAlert]
                                       - direction
                                       - indicators
                                       - trace_ids

Extension Points

1. Custom Values

Define domain-specific values in values.definitions:

{
  "values": {
    "declared": ["principal_benefit", "sustainability"],
    "definitions": {
      "sustainability": {
        "name": "Environmental Sustainability",
        "description": "Prefer options minimizing environmental impact",
        "priority": 3
      }
    }
  }
}

2. Protocol Extensions

Add protocol-specific data in extensions:

{
  "extensions": {
    "a2a": {
      "skills": ["search", "recommend"],
      "agent_card_url": "https://agent.example.com/.well-known/agent.json"
    },
    "mcp": {
      "tools": ["filesystem_read", "web_search"],
      "server_name": "my-tools"
    }
  }
}

3. Custom Escalation Triggers

Define complex conditions in escalation_triggers:

{
  "escalation_triggers": [
    {
      "condition": "action_type == \"purchase\"",
      "action": "escalate",
      "reason": "Purchases require approval"
    },
    {
      "condition": "amount > 100",
      "action": "escalate",
      "reason": "High-value transactions"
    },
    {
      "condition": "shares_personal_data",
      "action": "deny",
      "reason": "Never share PII"
    }
  ]
}

Supported condition syntax (SPEC Section 4.6):

field == "value" — String equality
field > N — Numeric comparison (>, <, >=, <=, !=)
field_name — Boolean check (truthy)

4. Verification Customization

Override default thresholds:

# Custom drift detection thresholds
alerts = detect_drift(
    card,
    traces,
    similarity_threshold=0.25,  # More sensitive
    sustained_threshold=2,       # Faster alerting
)

5. Integration Hooks

For A2A integration, extend the Agent Card:

{
  "name": "My Agent",
  "skills": [...],
  "alignment": {
    "$ref": "./alignment-card.json"
  }
}

For MCP integration, add alignment to tool manifests:

{
  "tools": [...],
  "resources": [
    {
      "uri": "alignment://card",
      "name": "Alignment Card",
      "mimeType": "application/json"
    }
  ]
}

Implementation Notes

Python SDK

Location: src/aap/
Models: Pydantic v2 with strict validation
Type hints: Full coverage, py.typed marker
Dependencies: Only pydantic>=2.0

pip install agent-alignment-protocol

TypeScript SDK

Location: typescript/src/
Output formats: CJS, ESM, DTS
Types: Full TypeScript types, no any
Dependencies: None (zero runtime deps)

npm install agent-alignment-protocol

JSON Schemas

Location: schemas/
Format: JSON Schema Draft 2020-12
Generated from: Pydantic models via model_json_schema()

Schemas can be used for:

Validation in any language (ajv, jsonschema, etc.)
Code generation (quicktype, json-schema-to-typescript)
Documentation (JSON Schema viewers)

Browser (Playground)

Location: docs/playground/
Runtime: Pyodide (Python in WASM)
API: window.AAP.verifyTrace(), etc.
No server: All verification runs client-side

Security Considerations

See security for the full threat model. Key points:

AAP does not ensure alignment — It provides visibility, not guarantees
AP-Traces are self-reported — Adversarial agents can lie
Verification is point-in-time — Does not prevent future violations
Thresholds are calibrated — But may not fit all domains

Defense in depth:

Use AAP alongside behavioral monitoring
Implement rate limiting and anomaly detection
Maintain human oversight for high-stakes decisions
Regularly audit AP-Trace storage for integrity

References

specification — Full protocol specification
limitations — What AAP does NOT guarantee
calibration — Threshold derivation methodology
quickstart — 5-minute integration guide
A2A integration — A2A integration guide
MCP migration — MCP integration guide

Protocols

Agent Alignment Protocol

Agent Integrity Protocol

AAP Architecture

AAP Architecture

Protocol Stack

Component Architecture

Overview

Schemas Module (`aap.schemas`)

Alignment Card (`alignment_card.py`)

AP-Trace (`ap_trace.py`)

Value Coherence (`value_coherence.py`)

Verification Engine (`aap.verification`)

`verify_trace(trace, card) -> VerificationResult`

`check_coherence(my_card, their_card) -> CoherenceResult`

`detect_drift(card, traces, thresholds) -> list[DriftAlert]`

Feature Extraction (`features.py`)

Constants (`constants.py`)

Data Flow

Single Trace Verification

Multi-Agent Coherence Check

Drift Detection Over Time

Extension Points

1. Custom Values

2. Protocol Extensions

3. Custom Escalation Triggers

4. Verification Customization

5. Integration Hooks

Implementation Notes

Python SDK

TypeScript SDK

JSON Schemas

Browser (Playground)

Security Considerations

References

Protocols

Agent Alignment Protocol

Agent Integrity Protocol

​AAP Architecture

​Protocol Stack

​Component Architecture

​Overview

​Schemas Module (aap.schemas)

​Alignment Card (alignment_card.py)

​AP-Trace (ap_trace.py)

​Value Coherence (value_coherence.py)

​Verification Engine (aap.verification)

​verify_trace(trace, card) -> VerificationResult

​check_coherence(my_card, their_card) -> CoherenceResult

​detect_drift(card, traces, thresholds) -> list[DriftAlert]

​Feature Extraction (features.py)

​Constants (constants.py)

​Data Flow

​Single Trace Verification

​Multi-Agent Coherence Check

​Drift Detection Over Time

​Extension Points

​1. Custom Values

​2. Protocol Extensions

​3. Custom Escalation Triggers

​4. Verification Customization

​5. Integration Hooks

​Implementation Notes

​Python SDK

​TypeScript SDK

​JSON Schemas

​Browser (Playground)

​Security Considerations

​References

AAP Architecture

Protocol Stack

Component Architecture

Overview

Schemas Module (`aap.schemas`)

Alignment Card (`alignment_card.py`)

AP-Trace (`ap_trace.py`)

Value Coherence (`value_coherence.py`)

Verification Engine (`aap.verification`)

`verify_trace(trace, card) -> VerificationResult`

`check_coherence(my_card, their_card) -> CoherenceResult`

`detect_drift(card, traces, thresholds) -> list[DriftAlert]`

Feature Extraction (`features.py`)

Constants (`constants.py`)

Data Flow

Single Trace Verification

Multi-Agent Coherence Check

Drift Detection Over Time

Extension Points

1. Custom Values

2. Protocol Extensions

3. Custom Escalation Triggers

4. Verification Customization

5. Integration Hooks

Implementation Notes

Python SDK

TypeScript SDK

JSON Schemas

Browser (Playground)

Security Considerations

References