Model Selection and Swapping in LLM Agents

This guide covers how LLM agents select, configure, and swap models for different purposes—including the main agent, sub-agents, specialized tools, and background processing tasks.

Overview: Where Models Are Used

┌─────────────────────────────────────────────────────────────────────────────┐
│                     MODEL USAGE IN AGENT SYSTEMS                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      MAIN AGENT                                      │   │
│  │  Primary model for user interaction and task execution               │   │
│  │  Model: User-configured (e.g., gpt-5-codex, claude-sonnet)          │   │
│  └──────────────────────────────┬──────────────────────────────────────┘   │
│                                 │                                           │
│         ┌───────────────────────┼───────────────────────┐                  │
│         │                       │                       │                  │
│         ▼                       ▼                       ▼                  │
│  ┌─────────────┐    ┌───────────────────┐    ┌─────────────────────┐      │
│  │ SUB-AGENTS  │    │ BACKGROUND TASKS  │    │ SPECIALIZED TOOLS   │      │
│  │             │    │                   │    │                     │      │
│  │ • Review    │    │ • Compaction      │    │ • Code analysis     │      │
│  │ • Analysis  │    │ • Summarization   │    │ • Search ranking    │      │
│  │ • Testing   │    │ • Auto-compact    │    │ • Validation        │      │
│  │             │    │                   │    │                     │      │
│  │ Model:      │    │ Model:            │    │ Model:              │      │
│  │ Configurable│    │ Same as main OR   │    │ Often smaller/      │      │
│  │ per-agent   │    │ dedicated compact │    │ specialized         │      │
│  └─────────────┘    └───────────────────┘    └─────────────────────┘      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Part 1: Model Families (Codex)

Codex groups models into "families" that share characteristics:

pub struct ModelFamily {
    /// Full model slug (e.g., "gpt-5.1-codex-2025-01-01")
    pub slug: String,
    
    /// Family name (e.g., "gpt-5.1-codex")
    pub family: String,
    
    /// Whether model needs special apply_patch instructions
    pub needs_special_apply_patch_instructions: bool,
    
    /// Whether model supports reasoning summaries
    pub supports_reasoning_summaries: bool,
    
    /// Default reasoning effort (low/medium/high)
    pub default_reasoning_effort: Option<ReasoningEffort>,
    
    /// Whether model supports parallel tool calls
    pub supports_parallel_tool_calls: bool,
    
    /// Type of apply_patch tool to use
    pub apply_patch_tool_type: Option<ApplyPatchToolType>,
    
    /// Base system prompt for this family
    pub base_instructions: String,
    
    /// Experimental tools available to this family
    pub experimental_supported_tools: Vec<String>,
    
    /// Effective context window (percentage of total)
    pub effective_context_window_percent: i64,
    
    /// Preferred shell tool type
    pub shell_type: ConfigShellToolType,
    
    /// Default truncation policy for tool outputs
    pub truncation_policy: TruncationPolicy,
}

Family Resolution

Models are matched to families by prefix:

pub fn find_family_for_model(slug: &str) -> ModelFamily {
    if slug.starts_with("o3") {
        model_family!(
            slug, "o3",
            supports_reasoning_summaries: true,
            needs_special_apply_patch_instructions: true,
        )
    } else if slug.starts_with("gpt-5.1-codex-max") {
        model_family!(
            slug, slug,
            supports_reasoning_summaries: true,
            base_instructions: GPT_5_1_CODEX_MAX_INSTRUCTIONS.to_string(),
            apply_patch_tool_type: Some(ApplyPatchToolType::Freeform),
            shell_type: ConfigShellToolType::ShellCommand,
            supports_parallel_tool_calls: true,
            truncation_policy: TruncationPolicy::Tokens(10_000),
        )
    } else if slug.starts_with("gpt-5-codex") {
        model_family!(
            slug, slug,
            supports_reasoning_summaries: true,
            base_instructions: GPT_5_CODEX_INSTRUCTIONS.to_string(),
            apply_patch_tool_type: Some(ApplyPatchToolType::Freeform),
            // ...
        )
    } else {
        // Default family for unknown models
        derive_default_model_family(slug)
    }
}

Model-Specific Instructions

Different model families get different system prompts:

// Each family can have its own prompt
const BASE_INSTRUCTIONS: &str = include_str!("../../prompt.md");
const GPT_5_CODEX_INSTRUCTIONS: &str = include_str!("../../gpt_5_codex_prompt.md");
const GPT_5_1_INSTRUCTIONS: &str = include_str!("../../gpt_5_1_prompt.md");
const GPT_5_1_CODEX_MAX_INSTRUCTIONS: &str = include_str!("../../gpt-5.1-codex-max_prompt.md");

Remote Model Overrides

Model properties can be updated from a remote API:

impl ModelFamily {
    pub fn with_remote_overrides(mut self, remote_models: Vec<ModelInfo>) -> Self {
        for model in remote_models {
            if model.slug == self.slug {
                // Override with server-provided values
                self.default_reasoning_effort = Some(model.default_reasoning_level);
                self.shell_type = model.shell_type;
            }
        }
        self
    }
}

Part 2: Model Selection for Sub-Agents (Claude Code)

Claude Code allows each sub-agent to specify its own model:

Agent Model Configuration

---
name: code-reviewer
description: Reviews code for bugs and security issues
model: sonnet        # Can be: inherit, sonnet, opus, haiku
color: red
tools: [Glob, Grep, Read]
---

You are an expert code reviewer...

Model Options

Value	Behavior
`inherit`	Use the same model as the parent agent
`sonnet`	Use Claude Sonnet (balanced)
`opus`	Use Claude Opus (most capable)
`haiku`	Use Claude Haiku (fast, efficient)

When to Use Each Model

## Model Selection Guidelines

### Use `inherit` (default)
- Most agents should inherit
- Ensures consistency with user's choice
- Reduces complexity

### Use `haiku`
- Simple, fast tasks
- Validation checks
- Quick lookups
- Cost-sensitive operations

### Use `sonnet`
- Balanced performance
- Code analysis
- Documentation generation
- Most development tasks

### Use `opus`
- Complex reasoning
- Architecture decisions
- Security analysis
- Multi-file refactoring

Real-World Examples

# Simple validation - use haiku for speed
---
name: syntax-checker
model: haiku
tools: [Read]
---

# Complex analysis - use opus for capability
---
name: code-reviewer
model: opus
tools: [Glob, Grep, Read]
---

# Standard task - inherit user's choice
---
name: test-generator
model: inherit
tools: [Read, Write, Bash]
---

Part 3: Model Swapping for Tasks (Codex)

Codex swaps models for specific tasks like review:

Review Task with Sub-Agent

async fn start_review_conversation(
    session: Arc<SessionTaskContext>,
    ctx: Arc<TurnContext>,
    input: Vec<UserInput>,
    cancellation_token: CancellationToken,
) -> Option<Receiver<Event>> {
    let config = ctx.client.config();
    
    // Create modified config for the review sub-agent
    let mut sub_agent_config = config.as_ref().clone();
    
    // Restrict to read-only sandbox
    sub_agent_config.sandbox_policy = SandboxPolicy::new_read_only_policy();
    
    // Clear outer user instructions (reviewer has its own rubric)
    sub_agent_config.user_instructions = None;
    
    // Don't load project docs
    sub_agent_config.project_doc_max_bytes = 0;
    
    // Disable certain features for review
    sub_agent_config.features
        .disable(Feature::WebSearchRequest)
        .disable(Feature::ViewImageTool);

    // Set review-specific instructions
    sub_agent_config.base_instructions = Some(REVIEW_PROMPT.to_string());
    
    // Launch sub-conversation with modified config
    run_codex_conversation_one_shot(
        sub_agent_config,
        session.auth_manager(),
        session.models_manager(),
        input,
        session.clone_session(),
        ctx.clone(),
        cancellation_token,
        None,
    ).await
}

Sub-Agent Model Construction

When spawning a sub-agent, Codex constructs a new model family:

pub async fn run_codex_conversation_one_shot(
    config: Config,
    auth_manager: Arc<AuthManager>,
    models_manager: Arc<ModelsManager>,
    input: Vec<UserInput>,
    // ...
) -> Result<CodexIO, CodexErr> {
    // The config.model determines which model family is used
    let model_family = models_manager
        .construct_model_family(&config.model, &config)
        .await;
    
    // Sub-agent gets its own client with this model family
    let client = Client::new(
        provider.clone(),
        config.model.clone(),
        model_family,
        config.reasoning_effort,
        // ...
    );
    
    // ...
}

Part 4: Background Processing Models

Compaction (Summarization)

Codex uses the same model for compaction by default:

pub async fn run_compact_task(
    sess: Arc<Session>,
    turn_context: Arc<TurnContext>,
    input: Vec<UserInput>,
) {
    // Uses the same client (and model) as the main session
    let prompt = Prompt {
        input: turn_context.history.get_history_for_prompt(),
        tools: vec![],  // No tools for compaction
        parallel_tool_calls: false,
        base_instructions_override: turn_context.base_instructions.clone(),
        output_schema: None,
    };
    
    // Stream through same client
    let stream = turn_context.client.clone().stream(&prompt).await?;
    // ...
}

Remote Compaction (Alternative)

For some auth modes, Codex uses a dedicated compaction endpoint:

pub fn should_use_remote_compact_task(session: &Session) -> bool {
    session.services.auth_manager.auth()
        .is_some_and(|auth| auth.mode == AuthMode::ChatGPT)
        && session.enabled(Feature::RemoteCompaction)
}

async fn run_remote_compact_task_inner_impl(
    sess: &Arc<Session>,
    turn_context: &Arc<TurnContext>,
) -> CodexResult<()> {
    // Uses a dedicated compact endpoint that may use a different model
    let new_history = turn_context.client
        .compact_conversation_history(&prompt)
        .await?;
    
    sess.replace_history(new_history).await;
    // ...
}

Part 5: Tool-Specific Model Configuration

Tools Configuration by Model Family

Different model families expose different tools:

pub struct ToolsConfig<'a> {
    model_family: &'a ModelFamily,
    experimental_enabled: bool,
    mcp_tools: Vec<ToolSpec>,
    features: Features,
}

impl<'a> ToolsConfig<'a> {
    pub fn create_tools(&self) -> Vec<ToolSpec> {
        let mut tools = vec![];
        
        // Shell tool type depends on model family
        let shell_tool = match self.model_family.shell_type {
            ConfigShellToolType::Default => create_shell_tool(),
            ConfigShellToolType::Local => create_local_shell_tool(),
            ConfigShellToolType::ShellCommand => create_shell_command_tool(),
            ConfigShellToolType::UnifiedExec => create_unified_exec_tool(),
        };
        tools.push(shell_tool);
        
        // Experimental tools based on model family
        for tool_name in &self.model_family.experimental_supported_tools {
            if let Some(tool) = create_experimental_tool(tool_name) {
                tools.push(tool);
            }
        }
        
        // Apply patch tool type depends on model
        if let Some(patch_type) = &self.model_family.apply_patch_tool_type {
            tools.push(create_apply_patch_tool(patch_type));
        }
        
        tools
    }
}

Model-Aware Truncation

Tool output truncation varies by model:

// Different models have different truncation policies
pub truncation_policy: TruncationPolicy,

// GPT-5 Codex: Token-based truncation
model_family!(
    "gpt-5-codex",
    truncation_policy: TruncationPolicy::Tokens(10_000),
)

// GPT-4: Byte-based truncation
model_family!(
    "gpt-4o",
    truncation_policy: TruncationPolicy::Bytes(10_000),
)

Part 6: Dynamic Model Management

Models Manager

Codex maintains a manager for available models:

pub struct ModelsManager {
    /// Available model presets (UI display)
    pub available_models: RwLock<Vec<ModelPreset>>,
    
    /// Remote model info (server-provided)
    pub remote_models: RwLock<Vec<ModelInfo>>,
    
    /// Auth manager for API access
    pub auth_manager: Arc<AuthManager>,
}

impl ModelsManager {
    /// Refresh available models from server
    pub async fn refresh_available_models(
        &self,
        provider: &ModelProviderInfo,
    ) -> CoreResult<Vec<ModelInfo>> {
        let client = ModelsClient::new(transport, api_provider, api_auth);
        
        let response = client.list_models(version, headers).await?;
        
        // Update cached models
        *self.remote_models.write().await = response.models.clone();
        *self.available_models.write().await = self.build_available_models().await;
        
        Ok(response.models)
    }
    
    /// Construct model family with remote overrides
    pub async fn construct_model_family(&self, model: &str, config: &Config) -> ModelFamily {
        find_family_for_model(model)
            .with_config_overrides(config)
            .with_remote_overrides(self.remote_models.read().await.clone())
    }
}

Model Switching at Runtime

Users can switch models mid-session:

// TUI handles /model command
async fn handle_model_switch(&mut self, new_model: String) {
    // Update config
    self.config.model = new_model.clone();
    
    // Construct new model family
    let model_family = self.models_manager
        .construct_model_family(&new_model, &self.config)
        .await;
    
    // Create new client with updated model
    self.turn_context = self.create_turn_context(model_family);
    
    // Notify user
    self.show_notification(format!("Switched to {}", new_model));
}

Part 7: Model Selection Patterns

Pattern 1: Inherit for Consistency

Most sub-agents should inherit the parent's model:

# Claude Code agent
model: inherit

// Codex: use same config
let sub_config = parent_config.clone();

Pattern 2: Override for Specialization

Use specific models for specialized tasks:

# Complex code review needs best model
model: opus

# Quick validation can use fast model
model: haiku

Pattern 3: Restrict for Safety

Sub-agents can have restricted capabilities:

// Review sub-agent has limited features
sub_agent_config.sandbox_policy = SandboxPolicy::new_read_only_policy();
sub_agent_config.features
    .disable(Feature::WebSearchRequest)
    .disable(Feature::ViewImageTool);

Pattern 4: Model-Appropriate Tools

Expose different tools based on model capabilities:

// Only models that support parallel calls get full tool set
if model_family.supports_parallel_tool_calls {
    tools.extend(parallel_capable_tools());
}

// Experimental tools only for certain families
for tool in &model_family.experimental_supported_tools {
    tools.push(create_tool(tool));
}

Part 8: Best Practices

1. Default to Inheritance

# Prefer inherit unless you have a specific reason
model: inherit

2. Match Model to Task Complexity

Task Type	Recommended Model
Simple validation	haiku
Code analysis	sonnet
Architecture decisions	opus
User-facing tasks	inherit

3. Consider Cost and Latency

// For background tasks, consider using smaller models
if is_background_task {
    config.model = "fast-model".to_string();
}

4. Preserve User Intent

// Main conversation should use user's chosen model
// Only override for specific sub-tasks
let main_model = user_config.model;
let review_model = if needs_deep_analysis {
    "opus"
} else {
    &main_model
};

5. Handle Model Unavailability

// Gracefully fall back if preferred model unavailable
let model = if available_models.contains(&preferred) {
    preferred
} else {
    default_model
};

6. Model-Aware Prompting

// Different models may need different instructions
let instructions = if model_family.needs_special_apply_patch_instructions {
    format!("{}\n{}", base, APPLY_PATCH_INSTRUCTIONS)
} else {
    base.to_string()
};

Summary

Model selection in LLM agents involves:

Model Families: Grouping models with shared characteristics
Sub-Agent Models: Allowing specialized models for specific tasks
Background Processing: Using appropriate models for compaction/summarization
Tool Configuration: Exposing model-appropriate tools
Dynamic Management: Refreshing and switching models at runtime
Best Practices: Balancing capability, cost, and user intent

The key insight is that different parts of an agent system may benefit from different models—the main agent using the user's choice, sub-agents using task-appropriate models, and background tasks using efficient models.