Skip to main content

Enforcement Modes

Smoltbot supports three enforcement modes that control how the system responds when alignment or integrity violations are detected. You can choose the level of intervention appropriate for your use case — from passive observation to active blocking.

Modes Overview

Observe

Detect violations, record them, take no action. This is the default mode. Ideal for initial deployment and monitoring.

Nudge

Detect violations and inject feedback into the agent’s next request via system prompt. The agent sees it and can self-correct.

Enforce

Hard block with 403 for non-streaming requests. Falls back to nudge for streaming requests.

Mode Details

Observe Mode (Default)

In observe mode, smoltbot detects and records all violations but takes no action to modify agent behavior. This is the default mode for all new agents.Behavior:
  • All API calls pass through unchanged
  • Violations are detected and recorded in the trace database
  • Integrity checkpoints are created for every interaction
  • Drift alerts are generated when behavioral patterns shift
  • No modification to agent requests or responses
When to use:
  • During initial deployment to establish behavioral baselines
  • When you want to monitor without affecting agent behavior
  • For compliance auditing where you need a record but not intervention
  • When evaluating whether to enable more active enforcement
Configuration:
curl -X PUT https://api.mnemom.ai/v1/agents/:id/enforcement \
  -H "Content-Type: application/json" \
  -d '{"mode": "observe"}'

Setting Enforcement Mode

Set enforcement mode via the API:
PUT /v1/agents/:id/enforcement
Request body:
{
  "mode": "observe" | "nudge" | "enforce"
}
Example:
# Enable nudge mode
curl -X PUT https://api.mnemom.ai/v1/agents/agent_abc123/enforcement \
  -H "Content-Type: application/json" \
  -d '{"mode": "nudge"}'

Nudge Strategy

When enforcement mode is nudge or enforce, you can further control when nudges are created using the nudge strategy setting:
StrategyBehavior
alwaysEvery boundary violation creates a nudge (default)
samplingNudge on a percentage of violations (uses proof_rate or dedicated nudge_rate)
thresholdOnly nudge after N violations in the current session
offNo nudging — violations are recorded but no correction is injected
Configuration:
curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \
  -H "Content-Type: application/json" \
  -d '{"nudge_strategy": "always"}'
Use threshold mode to avoid alert fatigue. The agent only receives a nudge after repeated violations in the same session, giving it a chance to self-correct naturally first.

Per-Agent Feature Toggles

Team operations require the team_reputation feature flag, available on Team and Enterprise plans. See Pricing for plan details.
Each agent has independent controls for the transparency and integrity pipeline:
SettingTypeDefaultDescription
aap_enabledbooleantrueEnable AAP action traces
aip_enabledbooleantrueEnable AIP integrity analysis
proof_enabledbooleantrueEnable cryptographic attestation (Ed25519 + Merkle)
proof_rateinteger100% of checkpoints that get full attestation
nudge_strategystringalwaysWhen to create nudges (always, sampling, threshold, off)
Configuration:
# Disable AIP for an agent (AAP traces still flow)
curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \
  -H "Content-Type: application/json" \
  -d '{"aip_enabled": false}'

# Set proof sampling to 50%
curl -X PUT https://api.mnemom.ai/v1/agents/:id/settings \
  -H "Content-Type: application/json" \
  -d '{"proof_rate": 50}'
These settings are also available in the Agent Settings panel on the web dashboard for claimed agents.

Violation Types and Enforcement

Enforcement applies to all violation types detected by the AAP verification engine, AIP integrity checks, and the policy engine:

Alignment Violations

Violation TypeSeverityEnforcement Behavior
FORBIDDEN_ACTIONCRITICALBlocked in enforce mode; nudged in nudge mode
CARD_MISMATCHCRITICALBlocked in enforce mode; nudged in nudge mode
UNBOUNDED_ACTIONHIGHBlocked in enforce mode; nudged in nudge mode
MISSED_ESCALATIONHIGHBlocked in enforce mode; nudged in nudge mode
CARD_EXPIREDHIGHBlocked in enforce mode; nudged in nudge mode
UNDECLARED_VALUEMEDIUMNudged in nudge/enforce mode (not blocked)

Policy Violations

Policy enforcement operates as a parallel system alongside alignment enforcement. While alignment enforcement checks agent behavior against card values, policy enforcement checks tool usage against governance rules.
Violation TypeSeverityEnforcement Behavior
POLICY_VIOLATIONHIGHBlocked when policy enforcement mode is enforce; logged when warn
UNMAPPED_TOOLMEDIUMLogged as warning; behavior depends on defaults.unmapped_tool_action
CAPABILITY_MISMATCHHIGHBlocked when policy enforcement mode is enforce; logged when warn
Policy enforcement is controlled independently via the enforcement_mode field in the Policy DSL:
  • warn — Log violations, return X-Policy-Verdict: warn header, allow request to proceed
  • enforce — Block requests with violations (HTTP 403), return X-Policy-Verdict: fail header
  • off — Skip policy evaluation entirely
The X-Policy-Verdict response header is always present when a policy is active:
Header ValueMeaning
passAll tools mapped and permitted
warnViolations detected but not blocking
failViolations detected and request blocked (enforce mode only)
Alignment enforcement (observe/nudge/enforce) and policy enforcement (off/warn/enforce) can be configured independently. For example, you might use nudge for alignment violations while using enforce for policy violations, or vice versa.
See Policy Engine for full details on how policies are evaluated, and Policy Management for setup instructions.
In enforce mode, only CRITICAL and HIGH severity violations trigger hard blocks on non-streaming requests. MEDIUM severity violations are always handled via nudge, even in enforce mode. This applies to both alignment and policy violations.

Conscience Timeline

All enforcement actions are tracked in the conscience timeline, accessible via the API and the web dashboard at mnemom.ai. The timeline records:
  • When a violation was detected
  • What type and severity
  • What enforcement action was taken (observed, nudged, blocked)
  • Whether the agent self-corrected after a nudge
  • Drift patterns across enforcement events

Provider Compatibility

Enforcement works across all providers where AIP is supported:
ProviderObserveNudgeEnforce (non-streaming)Enforce (streaming)
AnthropicYesYesYesFalls back to nudge
OpenAIYesYesYesFalls back to nudge
GeminiYesYesYesFalls back to nudge

Agent Containment

Agent containment is a separate enforcement layer that operates above the per-request enforcement modes. While enforcement modes (observe/nudge/enforce) control individual request handling, containment controls whether the agent can make requests at all.

Containment States

StateMeaningGateway Behavior
activeNormal operation (default)Requests proceed normally
pausedTemporarily stoppedAll requests blocked with HTTP 403
killedPermanently stoppedAll requests blocked with HTTP 403
Paused agents can be resumed by an org owner or admin. Killed agents require explicit reactivation by an owner only. The distinction matters for audit: pause means “we need to investigate,” kill means “this agent is compromised.”

Containment API

# Pause an agent
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/pause \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"reason": "Investigating boundary violations"}'

# Resume a paused agent
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/resume \
  -H "Authorization: Bearer $TOKEN"

# Kill an agent (owner only)
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/kill \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"reason": "Agent compromised"}'

# Reactivate a killed agent (owner only)
curl -X POST https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/reactivate \
  -H "Authorization: Bearer $TOKEN"

# Get containment status and audit log
curl https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/containment \
  -H "Authorization: Bearer $TOKEN"

Error Response

When a contained agent attempts an API request through the gateway, it receives:
{
  "error": "Agent contained",
  "type": "containment_error",
  "reason": "agent_paused"
}
The HTTP status code is 403 Forbidden (distinct from 402 Payment Required used for billing enforcement).

Auto-Containment

Agents can be configured to automatically pause after consecutive boundary violations:
# Enable auto-containment after 3 consecutive violations
curl -X PUT https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/containment-policy \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"auto_containment_threshold": 3}'

# Disable auto-containment
curl -X PUT https://api.mnemom.ai/v1/orgs/{org_id}/agents/{agent_id}/containment-policy \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"auto_containment_threshold": null}'
When auto-containment triggers, it:
  • Sets the agent status to paused with actor system
  • Logs the action in the containment audit log
  • Emits an agent.paused webhook event
  • Purges the gateway cache so the block takes effect immediately

RBAC Requirements

ActionRequired Role
Pauseowner or admin
Resumeowner or admin
Killowner only
Reactivateowner only
View statusAny org role
Set auto-containment policyowner or admin

Webhook Events

Three webhook events are emitted for containment actions:
  • agent.paused — Agent was paused (manually or automatically)
  • agent.resumed — Agent was resumed or reactivated
  • agent.killed — Agent was killed
Each event includes:
{
  "agent_id": "smolt-xxxxxxxx",
  "org_id": "org-xxx",
  "action": "pause",
  "actor": "user-xxx",
  "reason": "Investigating boundary violations",
  "previous_status": "active",
  "new_status": "paused"
}

Containment Audit Log

Every containment action is recorded in a tamper-evident audit log. Each entry includes:
  • The action taken (pause, resume, kill, reactivate, auto_pause)
  • Who triggered it (user ID or system for auto-containment)
  • The reason provided
  • Previous and new containment states
  • Timestamp