Security and Sandboxing

An autonomous AI agent that can execute code, call APIs, and manage files introduces real security risks. OpenClaw provides multiple layers of protection—Docker sandboxing, execution approvals, access controls, and security auditing. In this lesson you will learn what can go wrong and how to harden your setup.

Core Risks

Before configuring defenses, understand what you are defending against:

Risk	Description
Prompt injection	A malicious message tricks the agent into executing unintended actions
Skill supply chain	A community skill contains hidden malware or data exfiltration code
Tool misuse	The model calls a dangerous tool (e.g., `rm -rf /`) by mistake or manipulation
Unauthorized access	Someone messages your agent and gets it to perform actions on your behalf
Data leakage	Sensitive information from memory or files is exposed through a channel

Cisco's research found that 26% of 31,000 analyzed agent skills contained vulnerabilities including command injection, data exfiltration, and prompt injection attacks. Security is not optional.

Docker Sandbox

The most important security measure is running tools inside a Docker container. This creates an isolated environment where a misbehaving skill cannot damage your host system.

OpenClaw supports two sandboxing approaches:

Full Containerization

Run the entire Gateway inside Docker:

docker run -d \
  --name openclaw \
  -v ~/.openclaw:/workspace \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  openclaw/openclaw:latest

This gives you a complete container boundary with an isolated filesystem, restricted network access, and CPU/memory limits.

Per-Tool Sandbox

Alternatively, run only tool executions in Docker while keeping the Gateway on your host:

{
  "sandbox": {
    "mode": "all",
    "scope": "agent",
    "workspaceAccess": "ro"
  }
}

The workspaceAccess setting controls what the sandboxed tool can see:

Value	Meaning
`"none"`	No access to workspace files
`"ro"`	Read-only access (recommended)
`"rw"`	Read-write access (use with caution)

For most setups, "ro" is the right balance—skills can read your files to answer questions but cannot modify or delete them.

Execution Approvals

Execution approvals require your confirmation before the agent runs certain commands. This is your last line of defense before a tool actually executes.

Configure approvals in Settings or via exec-approvals.json:

Mode	Behavior
`security`	Blocks high-risk commands (file deletion, network calls) unless approved
`ask`	Prompts you for approval on every tool execution
`allowlist`	Only pre-approved commands can execute without prompting

For beginners, ask mode is safest—you see exactly what the agent wants to do before it happens. As you build trust, switch to security or allowlist for a smoother experience.

When an approval is needed, the agent sends you a message like:

I'd like to run: git push origin main
Allow? (yes/no)

You reply "yes" or "no" directly in your messaging platform.

Access Control

OpenClaw provides four DM access modes that control who can message your agent:

Mode	How It Works
Pairing (default)	Time-limited pairing codes; most secure for new setups
Allowlist	Only specified user IDs can interact
Open	Anyone can message (requires explicit opt-in)
Disabled	DMs are turned off completely

For group channels (Discord servers, Slack workspaces), you can set per-group policies and multi-user DM isolation so conversations stay private.

Recommendation: Start with Pairing mode. Add trusted users to the Allowlist as needed. Never use Open mode unless you specifically want a public-facing agent.

Security Audit CLI

OpenClaw includes a built-in security audit tool:

# Standard security check
openclaw security audit

# Deep audit with live Gateway probing
openclaw security audit --deep

# Automatically apply safe guardrails
openclaw security audit --fix

The audit checks for:

Insecure sandbox configuration
Missing execution approvals
Overly permissive access controls
Known vulnerable skills
Exposed API keys in configuration files

Run openclaw security audit after every configuration change and periodically as part of your maintenance routine.

Cisco AI Skill Scanner

Cisco released a dedicated scanner for OpenClaw skills that detects:

Command injection — skills that can execute arbitrary system commands
Data exfiltration — skills that send your data to external servers
Prompt injection — skills with instructions designed to hijack agent behavior

You can scan skills before installing them:

# Scan a skill directory
cisco-skill-scanner ./skills/suspicious-skill/

# Scan all installed skills
cisco-skill-scanner ~/.openclaw/skills/

ClawHub also integrates VirusTotal scanning automatically. Skills are continuously rechecked and tagged as approved, warned, or blocked.

Model Choice Matters

Your choice of AI model affects security. The OpenClaw documentation recommends:

Prefer Anthropic Opus 4.6 (or the latest Opus) because it is strong at recognizing prompt injections. Smaller or cheaper models are more susceptible to tool misuse and instruction hijacking.

If you use a budget model for cost savings, pair it with stricter sandbox settings and execution approvals.

Hardening Checklist

Here is a practical checklist to secure your OpenClaw installation:

Key Takeaway

Security is not optional when running an autonomous AI agent. OpenClaw provides Docker sandboxing, execution approvals, access controls, and audit tools. The biggest risks come from malicious skills and prompt injection attacks. Use the hardening checklist above, start with restrictive settings, and loosen them only as you build trust. In the next lesson, you will put everything together by building your first practical workflows.