Model Selection and Swapping in LLM Agents

This guide covers how LLM agents select, configure, and swap models for different purposes—including the main agent, sub-agents, specialized tools, and background processing tasks.

Overview: Where Models Are Used

┌─────────────────────────────────────────────────────────────────────────────┐
│                     MODEL USAGE IN AGENT SYSTEMS                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      MAIN AGENT                                      │   │
│  │  Primary model for user interaction and task execution               │   │
│  │  Model: User-configured (e.g., gpt-5-codex, claude-sonnet)          │   │
│  └──────────────────────────────┬──────────────────────────────────────┘   │
│                                 │                                           │
│         ┌───────────────────────┼───────────────────────┐                  │
│         │                       │                       │                  │
│         ▼                       ▼                       ▼                  │
│  ┌─────────────┐    ┌───────────────────┐    ┌─────────────────────┐      │
│  │ SUB-AGENTS  │    │ BACKGROUND TASKS  │    │ SPECIALIZED TOOLS   │      │
│  │             │    │                   │    │                     │      │
│  │ • Review    │    │ • Compaction      │    │ • Code analysis     │      │
│  │ • Analysis  │    │ • Summarization   │    │ • Search ranking    │      │
│  │ • Testing   │    │ • Auto-compact    │    │ • Validation        │      │
│  │             │    │                   │    │                     │      │
│  │ Model:      │    │ Model:            │    │ Model:              │      │
│  │ Configurable│    │ Same as main OR   │    │ Often smaller/      │      │
│  │ per-agent   │    │ dedicated compact │    │ specialized         │      │
│  └─────────────┘    └───────────────────┘    └─────────────────────┘      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Part 1: Model Families (Codex)

Codex groups models into "families" that share characteristics:

pub struct ModelFamily {
    /// Full model slug (e.g., "gpt-5.1-codex-2025-01-01")
    pub slug: String,
    
    /// Family name (e.g., "gpt-5.1-codex")
    pub family: String,
    
    /// Whether model needs special apply_patch instructions
    pub needs_special_apply_patch_instructions: bool,
    
    /// Whether model supports reasoning summaries
    pub supports_reasoning_summaries: bool,
    
    /// Default reasoning effort (low/medium/high)
    pub default_reasoning_effort: Option<ReasoningEffort>,
    
    /// Whether model supports parallel tool calls
    pub supports_parallel_tool_calls: bool,
    
    /// Type of apply_patch tool to use
    pub apply_patch_tool_type: Option<ApplyPatchToolType>,
    
    /// Base system prompt for this family
    pub base_instructions: String,
    
    /// Experimental tools available to this family
    pub experimental_supported_tools: Vec<String>,
    
    /// Effective context window (percentage of total)
    pub effective_context_window_percent: i64,
    
    /// Preferred shell tool type
    pub shell_type: ConfigShellToolType,
    
    /// Default truncation policy for tool outputs
    pub truncation_policy: TruncationPolicy,
}

Family Resolution

Models are matched to families by prefix:

pub fn find_family_for_model(slug: &str) -> ModelFamily {
    if slug.starts_with("o3") {
        model_family!(
            slug, "o3",
            supports_reasoning_summaries: true,
            needs_special_apply_patch_instructions: true,
        )
    } else if slug.starts_with("gpt-5.1-codex-max") {
        model_family!(
            slug, slug,
            supports_reasoning_summaries: true,
            base_instructions: GPT_5_1_CODEX_MAX_INSTRUCTIONS.to_string(),
            apply_patch_tool_type: Some(ApplyPatchToolType::Freeform),
            shell_type: ConfigShellToolType::ShellCommand,
            supports_parallel_tool_calls: true,
            truncation_policy: TruncationPolicy::Tokens(10_000),
        )
    } else if slug.starts_with("gpt-5-codex") {
        model_family!(
            slug, slug,
            supports_reasoning_summaries: true,
            base_instructions: GPT_5_CODEX_INSTRUCTIONS.to_string(),
            apply_patch_tool_type: Some(ApplyPatchToolType::Freeform),
            // ...
        )
    } else {
        // Default family for unknown models
        derive_default_model_family(slug)
    }
}

Model-Specific Instructions

Different model families get different system prompts:

// Each family can have its own prompt
const BASE_INSTRUCTIONS: &str = include_str!("../../prompt.md");
const GPT_5_CODEX_INSTRUCTIONS: &str = include_str!("../../gpt_5_codex_prompt.md");
const GPT_5_1_INSTRUCTIONS: &str = include_str!("../../gpt_5_1_prompt.md");
const GPT_5_1_CODEX_MAX_INSTRUCTIONS: &str = include_str!("../../gpt-5.1-codex-max_prompt.md");

Remote Model Overrides

Model properties can be updated from a remote API:

impl ModelFamily {
    pub fn with_remote_overrides(mut self, remote_models: Vec<ModelInfo>) -> Self {
        for model in remote_models {
            if model.slug == self.slug {
                // Override with server-provided values
                self.default_reasoning_effort = Some(model.default_reasoning_level);
                self.shell_type = model.shell_type;
            }
        }
        self
    }
}

Part 2: Model Selection for Sub-Agents (Claude Code)

Claude Code allows each sub-agent to specify its own model:

Agent Model Configuration

---
name: code-reviewer
description: Reviews code for bugs and security issues
model: sonnet        # Can be: inherit, sonnet, opus, haiku
color: red
tools: [Glob, Grep, Read]
---

You are an expert code reviewer...

Model Options

Value Behavior
inherit Use the same model as the parent agent
sonnet Use Claude Sonnet (balanced)
opus Use Claude Opus (most capable)
haiku Use Claude Haiku (fast, efficient)

When to Use Each Model

## Model Selection Guidelines

### Use `inherit` (default)
- Most agents should inherit
- Ensures consistency with user's choice
- Reduces complexity

### Use `haiku`
- Simple, fast tasks
- Validation checks
- Quick lookups
- Cost-sensitive operations

### Use `sonnet`
- Balanced performance
- Code analysis
- Documentation generation
- Most development tasks

### Use `opus`
- Complex reasoning
- Architecture decisions
- Security analysis
- Multi-file refactoring

Real-World Examples

# Simple validation - use haiku for speed
---
name: syntax-checker
model: haiku
tools: [Read]
---

# Complex analysis - use opus for capability
---
name: code-reviewer
model: opus
tools: [Glob, Grep, Read]
---

# Standard task - inherit user's choice
---
name: test-generator
model: inherit
tools: [Read, Write, Bash]
---

Part 3: Model Swapping for Tasks (Codex)

Codex swaps models for specific tasks like review:

Review Task with Sub-Agent

async fn start_review_conversation(
    session: Arc<SessionTaskContext>,
    ctx: Arc<TurnContext>,
    input: Vec<UserInput>,
    cancellation_token: CancellationToken,
) -> Option<Receiver<Event>> {
    let config = ctx.client.config();
    
    // Create modified config for the review sub-agent
    let mut sub_agent_config = config.as_ref().clone();
    
    // Restrict to read-only sandbox
    sub_agent_config.sandbox_policy = SandboxPolicy::new_read_only_policy();
    
    // Clear outer user instructions (reviewer has its own rubric)
    sub_agent_config.user_instructions = None;
    
    // Don't load project docs
    sub_agent_config.project_doc_max_bytes = 0;
    
    // Disable certain features for review
    sub_agent_config.features
        .disable(Feature::WebSearchRequest)
        .disable(Feature::ViewImageTool);

    // Set review-specific instructions
    sub_agent_config.base_instructions = Some(REVIEW_PROMPT.to_string());
    
    // Launch sub-conversation with modified config
    run_codex_conversation_one_shot(
        sub_agent_config,
        session.auth_manager(),
        session.models_manager(),
        input,
        session.clone_session(),
        ctx.clone(),
        cancellation_token,
        None,
    ).await
}

Sub-Agent Model Construction

When spawning a sub-agent, Codex constructs a new model family:

pub async fn run_codex_conversation_one_shot(
    config: Config,
    auth_manager: Arc<AuthManager>,
    models_manager: Arc<ModelsManager>,
    input: Vec<UserInput>,
    // ...
) -> Result<CodexIO, CodexErr> {
    // The config.model determines which model family is used
    let model_family = models_manager
        .construct_model_family(&config.model, &config)
        .await;
    
    // Sub-agent gets its own client with this model family
    let client = Client::new(
        provider.clone(),
        config.model.clone(),
        model_family,
        config.reasoning_effort,
        // ...
    );
    
    // ...
}

Part 4: Background Processing Models

Compaction (Summarization)

Codex uses the same model for compaction by default:

pub async fn run_compact_task(
    sess: Arc<Session>,
    turn_context: Arc<TurnContext>,
    input: Vec<UserInput>,
) {
    // Uses the same client (and model) as the main session
    let prompt = Prompt {
        input: turn_context.history.get_history_for_prompt(),
        tools: vec![],  // No tools for compaction
        parallel_tool_calls: false,
        base_instructions_override: turn_context.base_instructions.clone(),
        output_schema: None,
    };
    
    // Stream through same client
    let stream = turn_context.client.clone().stream(&prompt).await?;
    // ...
}

Remote Compaction (Alternative)

For some auth modes, Codex uses a dedicated compaction endpoint:

pub fn should_use_remote_compact_task(session: &Session) -> bool {
    session.services.auth_manager.auth()
        .is_some_and(|auth| auth.mode == AuthMode::ChatGPT)
        && session.enabled(Feature::RemoteCompaction)
}

async fn run_remote_compact_task_inner_impl(
    sess: &Arc<Session>,
    turn_context: &Arc<TurnContext>,
) -> CodexResult<()> {
    // Uses a dedicated compact endpoint that may use a different model
    let new_history = turn_context.client
        .compact_conversation_history(&prompt)
        .await?;
    
    sess.replace_history(new_history).await;
    // ...
}

Part 5: Tool-Specific Model Configuration

Tools Configuration by Model Family

Different model families expose different tools:

pub struct ToolsConfig<'a> {
    model_family: &'a ModelFamily,
    experimental_enabled: bool,
    mcp_tools: Vec<ToolSpec>,
    features: Features,
}

impl<'a> ToolsConfig<'a> {
    pub fn create_tools(&self) -> Vec<ToolSpec> {
        let mut tools = vec![];
        
        // Shell tool type depends on model family
        let shell_tool = match self.model_family.shell_type {
            ConfigShellToolType::Default => create_shell_tool(),
            ConfigShellToolType::Local => create_local_shell_tool(),
            ConfigShellToolType::ShellCommand => create_shell_command_tool(),
            ConfigShellToolType::UnifiedExec => create_unified_exec_tool(),
        };
        tools.push(shell_tool);
        
        // Experimental tools based on model family
        for tool_name in &self.model_family.experimental_supported_tools {
            if let Some(tool) = create_experimental_tool(tool_name) {
                tools.push(tool);
            }
        }
        
        // Apply patch tool type depends on model
        if let Some(patch_type) = &self.model_family.apply_patch_tool_type {
            tools.push(create_apply_patch_tool(patch_type));
        }
        
        tools
    }
}

Model-Aware Truncation

Tool output truncation varies by model:

// Different models have different truncation policies
pub truncation_policy: TruncationPolicy,

// GPT-5 Codex: Token-based truncation
model_family!(
    "gpt-5-codex",
    truncation_policy: TruncationPolicy::Tokens(10_000),
)

// GPT-4: Byte-based truncation
model_family!(
    "gpt-4o",
    truncation_policy: TruncationPolicy::Bytes(10_000),
)

Part 6: Dynamic Model Management

Models Manager

Codex maintains a manager for available models:

pub struct ModelsManager {
    /// Available model presets (UI display)
    pub available_models: RwLock<Vec<ModelPreset>>,
    
    /// Remote model info (server-provided)
    pub remote_models: RwLock<Vec<ModelInfo>>,
    
    /// Auth manager for API access
    pub auth_manager: Arc<AuthManager>,
}

impl ModelsManager {
    /// Refresh available models from server
    pub async fn refresh_available_models(
        &self,
        provider: &ModelProviderInfo,
    ) -> CoreResult<Vec<ModelInfo>> {
        let client = ModelsClient::new(transport, api_provider, api_auth);
        
        let response = client.list_models(version, headers).await?;
        
        // Update cached models
        *self.remote_models.write().await = response.models.clone();
        *self.available_models.write().await = self.build_available_models().await;
        
        Ok(response.models)
    }
    
    /// Construct model family with remote overrides
    pub async fn construct_model_family(&self, model: &str, config: &Config) -> ModelFamily {
        find_family_for_model(model)
            .with_config_overrides(config)
            .with_remote_overrides(self.remote_models.read().await.clone())
    }
}

Model Switching at Runtime

Users can switch models mid-session:

// TUI handles /model command
async fn handle_model_switch(&mut self, new_model: String) {
    // Update config
    self.config.model = new_model.clone();
    
    // Construct new model family
    let model_family = self.models_manager
        .construct_model_family(&new_model, &self.config)
        .await;
    
    // Create new client with updated model
    self.turn_context = self.create_turn_context(model_family);
    
    // Notify user
    self.show_notification(format!("Switched to {}", new_model));
}

Part 7: Model Selection Patterns

Pattern 1: Inherit for Consistency

Most sub-agents should inherit the parent's model:

# Claude Code agent
model: inherit
// Codex: use same config
let sub_config = parent_config.clone();

Pattern 2: Override for Specialization

Use specific models for specialized tasks:

# Complex code review needs best model
model: opus

# Quick validation can use fast model
model: haiku

Pattern 3: Restrict for Safety

Sub-agents can have restricted capabilities:

// Review sub-agent has limited features
sub_agent_config.sandbox_policy = SandboxPolicy::new_read_only_policy();
sub_agent_config.features
    .disable(Feature::WebSearchRequest)
    .disable(Feature::ViewImageTool);

Pattern 4: Model-Appropriate Tools

Expose different tools based on model capabilities:

// Only models that support parallel calls get full tool set
if model_family.supports_parallel_tool_calls {
    tools.extend(parallel_capable_tools());
}

// Experimental tools only for certain families
for tool in &model_family.experimental_supported_tools {
    tools.push(create_tool(tool));
}

Part 8: Best Practices

1. Default to Inheritance

# Prefer inherit unless you have a specific reason
model: inherit

2. Match Model to Task Complexity

Task Type Recommended Model
Simple validation haiku
Code analysis sonnet
Architecture decisions opus
User-facing tasks inherit

3. Consider Cost and Latency

// For background tasks, consider using smaller models
if is_background_task {
    config.model = "fast-model".to_string();
}

4. Preserve User Intent

// Main conversation should use user's chosen model
// Only override for specific sub-tasks
let main_model = user_config.model;
let review_model = if needs_deep_analysis {
    "opus"
} else {
    &main_model
};

5. Handle Model Unavailability

// Gracefully fall back if preferred model unavailable
let model = if available_models.contains(&preferred) {
    preferred
} else {
    default_model
};

6. Model-Aware Prompting

// Different models may need different instructions
let instructions = if model_family.needs_special_apply_patch_instructions {
    format!("{}\n{}", base, APPLY_PATCH_INSTRUCTIONS)
} else {
    base.to_string()
};

Summary

Model selection in LLM agents involves:

  1. Model Families: Grouping models with shared characteristics
  2. Sub-Agent Models: Allowing specialized models for specific tasks
  3. Background Processing: Using appropriate models for compaction/summarization
  4. Tool Configuration: Exposing model-appropriate tools
  5. Dynamic Management: Refreshing and switching models at runtime
  6. Best Practices: Balancing capability, cost, and user intent

The key insight is that different parts of an agent system may benefit from different models—the main agent using the user's choice, sub-agents using task-appropriate models, and background tasks using efficient models.