Sandboxing and Safety in LLM Agent Tool Execution

Allowing LLMs to execute tools—especially shell commands—requires robust safety mechanisms. This guide covers how Claude Code and Codex approach safety, and how to build similar protections.

Safety Philosophy

Defense in Depth

Both systems use multiple layers of protection:

┌─────────────────────────────────────────────────────────┐
│                    USER LAYER                           │
│  • Approval prompts for dangerous operations            │
│  • Permission modes (read-only, workspace, full)        │
└─────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────┐
│                 APPLICATION LAYER                       │
│  • Hooks for tool validation (Claude Code)              │
│  • Approval policies (Codex)                            │
│  • Command safety classification                        │
└─────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────┐
│                   SYSTEM LAYER                          │
│  • OS-level sandboxing (Seatbelt, Landlock, seccomp)   │
│  • Filesystem restrictions                              │
│  • Network isolation                                    │
└─────────────────────────────────────────────────────────┘

Claude Code's Hook-Based Safety

Hook Events for Safety

Claude Code uses hooks at key lifecycle points:

Hook Event Purpose Use Case
PreToolUse Validate before execution Block dangerous commands
PostToolUse React to results Log operations, alert on issues
Stop Validate completion Ensure tests pass before stopping
UserPromptSubmit Validate user input Add security reminders

PreToolUse Hook Example

Block dangerous file operations:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/validate-write.sh",
            "timeout": 10
          }
        ]
      }
    ]
  }
}
#!/bin/bash
# validate-write.sh
set -euo pipefail

input=$(cat)
file_path=$(echo "$input" | jq -r '.tool_input.file_path')

# Deny path traversal
if [[ "$file_path" == *".."* ]]; then
  echo '{"hookSpecificOutput": {"permissionDecision": "deny"}, "systemMessage": "Path traversal not allowed"}' >&2
  exit 2
fi

# Deny sensitive files
SENSITIVE_PATTERNS=(".env" ".ssh" ".aws" "credentials" "secrets")
for pattern in "${SENSITIVE_PATTERNS[@]}"; do
  if [[ "$file_path" == *"$pattern"* ]]; then
    echo '{"hookSpecificOutput": {"permissionDecision": "deny"}, "systemMessage": "Cannot modify sensitive file: $pattern"}' >&2
    exit 2
  fi
done

# Allow operation
echo '{}'
exit 0

Prompt-Based Hooks

Use LLM reasoning for complex validation:

{
  "PreToolUse": [
    {
      "matcher": "Bash",
      "hooks": [
        {
          "type": "prompt",
          "prompt": "Evaluate if this bash command is safe to execute. Consider: system damage, data loss, network access, privilege escalation. Command: $TOOL_INPUT. Return 'approve' or 'deny' with reason.",
          "timeout": 30
        }
      ]
    }
  ]
}

Permission Modes

Claude Code's allowed-tools in command frontmatter:

---
# Restrictive: only specific tools
allowed-tools: [Read, Grep]
---

# Bash with restrictions
allowed-tools: [Bash(git:*)]  # Only git commands

# All tools (dangerous)
allowed-tools: [*]

Codex's Sandbox Architecture

Approval Policies

Codex defines when user approval is required:

pub enum AskForApproval {
    Never,           // Never ask, auto-approve all
    OnFailure,       // Ask only when sandbox denies
    OnRequest,       // Ask for operations outside sandbox
    UnlessTrusted,   // Always ask unless whitelisted
}

Sandbox Modes

OS-level isolation for command execution:

pub enum SandboxPolicy {
    ReadOnly,           // No writes, no network
    WorkspaceWrite,     // Write in workspace only
    DangerFullAccess,   // No restrictions
}

Platform-Specific Sandboxing

macOS (Seatbelt):

// Uses Apple's sandbox-exec with custom profiles
pub fn create_seatbelt_profile(policy: &SandboxPolicy, cwd: &Path) -> String {
    match policy {
        SandboxPolicy::ReadOnly => {
            format!(r#"
                (version 1)
                (deny default)
                (allow file-read*)
                (deny network*)
            "#)
        }
        SandboxPolicy::WorkspaceWrite => {
            format!(r#"
                (version 1)
                (deny default)
                (allow file-read*)
                (allow file-write* (subpath "{}"))
                (allow file-write* (subpath "/tmp"))
                (deny network*)
            "#, cwd.display())
        }
        SandboxPolicy::DangerFullAccess => {
            "(version 1)\n(allow default)".to_string()
        }
    }
}

Linux (Landlock + seccomp):

pub fn apply_landlock_policy(policy: &SandboxPolicy, cwd: &Path) -> Result<(), Error> {
    let mut ruleset = Ruleset::new()?;
    
    match policy {
        SandboxPolicy::ReadOnly => {
            // Read-only access to entire filesystem
            ruleset.add_rule(PathBeneathRules::new(
                "/",
                AccessFs::ReadFile | AccessFs::ReadDir,
            ))?;
        }
        SandboxPolicy::WorkspaceWrite => {
            // Read everywhere
            ruleset.add_rule(PathBeneathRules::new(
                "/",
                AccessFs::ReadFile | AccessFs::ReadDir,
            ))?;
            // Write only in workspace
            ruleset.add_rule(PathBeneathRules::new(
                cwd,
                AccessFs::all(),
            ))?;
        }
        _ => {}
    }
    
    ruleset.apply()?;
    Ok(())
}

The Orchestrator Pattern

Codex's ToolOrchestrator manages the safety flow:

pub struct ToolOrchestrator {
    sandbox: SandboxManager,
}

impl ToolOrchestrator {
    pub async fn run<Rq, Out, T>(
        &mut self,
        tool: &mut T,
        req: &Rq,
        tool_ctx: &ToolCtx<'_>,
        turn_ctx: &TurnContext,
        approval_policy: AskForApproval,
    ) -> Result<Out, ToolError>
    where
        T: ToolRuntime<Rq, Out>,
    {
        // 1. CHECK APPROVAL REQUIREMENT
        let requirement = tool.exec_approval_requirement(req)
            .unwrap_or_else(|| default_exec_approval_requirement(approval_policy, &turn_ctx.sandbox_policy));

        match requirement {
            ExecApprovalRequirement::Skip { .. } => {
                // Auto-approved, continue
            }
            ExecApprovalRequirement::Forbidden { reason } => {
                return Err(ToolError::Rejected(reason));
            }
            ExecApprovalRequirement::NeedsApproval { reason, .. } => {
                // Request user approval
                let decision = tool.start_approval_async(req, ctx).await;
                match decision {
                    ReviewDecision::Denied | ReviewDecision::Abort => {
                        return Err(ToolError::Rejected("rejected by user".into()));
                    }
                    ReviewDecision::Approved | ReviewDecision::ApprovedForSession => {
                        // Continue
                    }
                }
            }
        }

        // 2. SELECT SANDBOX MODE
        let initial_sandbox = self.sandbox.select_initial(&turn_ctx.sandbox_policy);

        // 3. FIRST ATTEMPT (sandboxed)
        let attempt = SandboxAttempt {
            sandbox: initial_sandbox,
            policy: &turn_ctx.sandbox_policy,
            manager: &self.sandbox,
            sandbox_cwd: &turn_ctx.cwd,
        };

        match tool.run(req, &attempt, tool_ctx).await {
            Ok(out) => Ok(out),
            
            // 4. HANDLE SANDBOX DENIAL
            Err(ToolError::Codex(CodexErr::Sandbox(SandboxErr::Denied { output }))) => {
                // Ask for escalation approval
                if !tool.wants_no_sandbox_approval(approval_policy) {
                    return Err(ToolError::Codex(CodexErr::Sandbox(SandboxErr::Denied { output })));
                }

                // Request escalation
                let decision = tool.start_approval_async(req, escalation_ctx).await;
                if decision != ReviewDecision::Approved {
                    return Err(ToolError::Rejected("escalation rejected".into()));
                }

                // 5. RETRY WITHOUT SANDBOX
                let escalated_attempt = SandboxAttempt {
                    sandbox: SandboxType::None,
                    ..attempt
                };
                tool.run(req, &escalated_attempt, tool_ctx).await
            }
            other => other,
        }
    }
}

Command Safety Classification

Codex classifies commands as safe/dangerous:

pub fn is_known_safe_command(command: &[String]) -> bool {
    let safe_commands = [
        // Read-only commands
        "ls", "cat", "head", "tail", "less", "more",
        "grep", "find", "which", "pwd", "env",
        // Git read operations
        "git status", "git log", "git diff", "git show",
        // Build info
        "cargo --version", "npm --version", "node --version",
    ];

    let cmd_str = command.join(" ");
    safe_commands.iter().any(|safe| cmd_str.starts_with(safe))
}

pub fn is_dangerous_command(command: &[String]) -> Option<&'static str> {
    let dangerous_patterns = [
        ("rm -rf /", "Recursive delete of root"),
        ("chmod 777", "Overly permissive permissions"),
        ("curl | bash", "Remote code execution"),
        ("sudo", "Privilege escalation"),
        ("> /dev/", "Writing to device files"),
    ];

    let cmd_str = command.join(" ");
    for (pattern, reason) in dangerous_patterns {
        if cmd_str.contains(pattern) {
            return Some(reason);
        }
    }
    None
}

Building Safety Mechanisms

Step 1: Define Approval Requirements

pub enum ApprovalRequirement {
    /// No approval needed
    Skip,
    /// User must approve
    NeedsApproval { reason: String },
    /// Operation forbidden entirely
    Forbidden { reason: String },
}

pub fn determine_approval_requirement(
    tool: &str,
    args: &ToolArgs,
    policy: &SecurityPolicy,
) -> ApprovalRequirement {
    // Check if tool is in allowlist
    if policy.allowed_tools.contains(tool) {
        return ApprovalRequirement::Skip;
    }

    // Check for dangerous operations
    if let Some(reason) = is_dangerous_operation(tool, args) {
        if policy.strict_mode {
            return ApprovalRequirement::Forbidden { reason };
        }
        return ApprovalRequirement::NeedsApproval { reason };
    }

    // Default based on policy
    match policy.default_approval {
        DefaultApproval::Allow => ApprovalRequirement::Skip,
        DefaultApproval::Ask => ApprovalRequirement::NeedsApproval {
            reason: "Operation requires approval".into(),
        },
        DefaultApproval::Deny => ApprovalRequirement::Forbidden {
            reason: "Operation not permitted".into(),
        },
    }
}

Step 2: Implement Approval Cache

Avoid asking for the same approval repeatedly:

pub struct ApprovalCache {
    session_approvals: HashSet<ApprovalKey>,
}

impl ApprovalCache {
    pub fn is_approved(&self, key: &ApprovalKey) -> bool {
        self.session_approvals.contains(key)
    }

    pub fn approve_for_session(&mut self, key: ApprovalKey) {
        self.session_approvals.insert(key);
    }
}

// Usage
pub async fn get_approval(
    cache: &mut ApprovalCache,
    key: ApprovalKey,
    prompt_user: impl FnOnce() -> Future<Output = bool>,
) -> bool {
    if cache.is_approved(&key) {
        return true;
    }

    let approved = prompt_user().await;
    if approved {
        cache.approve_for_session(key);
    }
    approved
}

Step 3: Build Sandbox Abstraction

pub trait Sandbox {
    fn prepare(&self, cwd: &Path) -> Result<SandboxEnv, SandboxError>;
    fn execute(&self, cmd: &[String], env: SandboxEnv) -> Result<Output, SandboxError>;
    fn is_supported() -> bool;
}

pub struct SandboxManager {
    preferred: Box<dyn Sandbox>,
    fallbacks: Vec<Box<dyn Sandbox>>,
}

impl SandboxManager {
    pub fn new() -> Self {
        #[cfg(target_os = "macos")]
        let preferred = Box::new(SeatbeltSandbox::new());
        
        #[cfg(target_os = "linux")]
        let preferred = Box::new(LandlockSandbox::new());
        
        #[cfg(target_os = "windows")]
        let preferred = Box::new(AppContainerSandbox::new());

        Self {
            preferred,
            fallbacks: vec![Box::new(NoSandbox::new())],
        }
    }

    pub fn execute(&self, cmd: &[String], cwd: &Path, policy: &SandboxPolicy) -> Result<Output, Error> {
        if matches!(policy, SandboxPolicy::DangerFullAccess) {
            return self.execute_unsandboxed(cmd, cwd);
        }

        // Try preferred sandbox
        match self.preferred.prepare(cwd) {
            Ok(env) => return self.preferred.execute(cmd, env),
            Err(e) => tracing::warn!("Primary sandbox unavailable: {e}"),
        }

        // Try fallbacks
        for fallback in &self.fallbacks {
            if let Ok(env) = fallback.prepare(cwd) {
                return fallback.execute(cmd, env);
            }
        }

        Err(Error::NoSandboxAvailable)
    }
}

Step 4: Implement Hook System

interface Hook {
  matcher: string | RegExp;
  handler: (context: HookContext) => Promise<HookResult>;
}

interface HookResult {
  decision: 'allow' | 'deny' | 'ask';
  message?: string;
  modifiedInput?: Record<string, unknown>;
}

class HookRunner {
  private hooks: Map<HookEvent, Hook[]> = new Map();

  register(event: HookEvent, hook: Hook): void {
    const hooks = this.hooks.get(event) || [];
    hooks.push(hook);
    this.hooks.set(event, hooks);
  }

  async run(event: HookEvent, context: HookContext): Promise<HookResult[]> {
    const hooks = this.hooks.get(event) || [];
    const matchingHooks = hooks.filter(h => this.matches(h.matcher, context.toolName));
    
    // Run all matching hooks in parallel
    const results = await Promise.all(
      matchingHooks.map(h => h.handler(context))
    );

    return results;
  }

  private matches(matcher: string | RegExp, toolName: string): boolean {
    if (matcher === '*') return true;
    if (typeof matcher === 'string') {
      return matcher.split('|').includes(toolName);
    }
    return matcher.test(toolName);
  }
}

// Integration with tool execution
async function executeToolWithHooks(
  hookRunner: HookRunner,
  tool: Tool,
  args: Record<string, unknown>,
): Promise<ToolResult> {
  // Run PreToolUse hooks
  const preResults = await hookRunner.run('PreToolUse', {
    toolName: tool.name,
    toolInput: args,
  });

  // Check for denials
  const denied = preResults.find(r => r.decision === 'deny');
  if (denied) {
    return { error: denied.message || 'Denied by hook' };
  }

  // Apply any input modifications
  let modifiedArgs = args;
  for (const result of preResults) {
    if (result.modifiedInput) {
      modifiedArgs = { ...modifiedArgs, ...result.modifiedInput };
    }
  }

  // Execute tool
  const result = await tool.execute(modifiedArgs);

  // Run PostToolUse hooks
  await hookRunner.run('PostToolUse', {
    toolName: tool.name,
    toolInput: modifiedArgs,
    toolResult: result,
  });

  return result;
}

Safety Patterns

Pattern 1: Escalating Permissions

Start restricted, escalate on failure:

┌─────────────────┐
│ Try Sandboxed   │
└────────┬────────┘
         │ Denied?
┌────────▼────────┐
│ Request Approval │
└────────┬────────┘
         │ Approved?
┌────────▼────────┐
│ Retry Unsandboxed│
└─────────────────┘

Pattern 2: Command Allowlists

Maintain lists of known-safe operations:

# safe-commands.yaml
read_operations:
  - ls
  - cat
  - head
  - tail
  - grep
  - find

git_operations:
  - git status
  - git log
  - git diff
  - git branch

build_operations:
  - npm install
  - npm test
  - cargo build
  - cargo test

Pattern 3: Prompt-Based Validation

Use the LLM itself for safety decisions:

Evaluate this command for safety risks:
Command: {{command}}
Working directory: {{cwd}}

Consider:
1. Data loss potential (file deletion, overwrite)
2. System impact (permissions, services)
3. Network access (external requests)
4. Credential exposure
5. Privilege escalation

Respond with:
- SAFE: No significant risks
- REVIEW: Needs human approval with explanation
- BLOCK: Should not be executed with reason

Configuration Examples

Codex config.toml

# Conservative defaults
approval_policy = "on-request"
sandbox_mode = "workspace-write"

# Allow network in sandbox
[sandbox_workspace_write]
network_access = false

# Profiles for different use cases
[profiles.read_only]
approval_policy = "never"
sandbox_mode = "read-only"

[profiles.full_auto]
approval_policy = "on-request"
sandbox_mode = "workspace-write"

[profiles.dangerous]
approval_policy = "never"
sandbox_mode = "danger-full-access"

Claude Code hooks.json

{
  "description": "Security validation hooks",
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/validate-bash.sh",
            "timeout": 5
          }
        ]
      },
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "prompt",
            "prompt": "Check if writing to $TOOL_INPUT.file_path is safe. Consider sensitive files, system paths, credentials. Return approve/deny.",
            "timeout": 10
          }
        ]
      }
    ]
  }
}

Best Practices

  1. Default to restrictive: Start with minimal permissions, escalate on need
  2. Sandbox by default: Always sandbox command execution when possible
  3. Cache approvals: Don't ask for the same approval repeatedly in a session
  4. Log everything: Audit trail for all tool executions
  5. Fail closed: When uncertain, deny rather than allow
  6. Defense in depth: Multiple layers of protection
  7. User transparency: Clear feedback about what's being allowed/denied
  8. Escape hatches: Allow expert users to bypass with explicit flags

Summary

Safety in LLM tool execution requires:

Layer Mechanism Example
User Approval prompts "Allow write to /etc/hosts?"
Application Hooks/policies Block rm -rf, validate paths
System OS sandboxing Seatbelt, Landlock, seccomp

The next guide covers parallel tool execution strategies.