How LLMs Are Instructed to Call Tools and How Responses Are Parsed

This guide explains the complete flow of tool calling in LLM agents: how the model is instructed about available tools, how it decides to call them, and how its responses are parsed and executed.

Overview: The Tool Calling Flow

┌──────────────────────────────────────────────────────────────────────┐
│                        TOOL CALLING FLOW                             │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. SETUP: Instruct the model about available tools                  │
│     ┌──────────────┐  ┌──────────────┐  ┌──────────────┐            │
│     │ System Prompt│  │ Tool Schemas │  │ User Context │            │
│     │ (behavior)   │  │ (JSON specs) │  │ (AGENTS.md)  │            │
│     └──────┬───────┘  └──────┬───────┘  └──────┬───────┘            │
│            │                 │                 │                     │
│            └─────────────────┼─────────────────┘                     │
│                              ▼                                       │
│  2. REQUEST: Send to model API with tool definitions                 │
│     ┌────────────────────────────────────────┐                      │
│     │  POST /v1/responses                    │                      │
│     │  {                                     │                      │
│     │    instructions: "...",                │                      │
│     │    input: [...messages...],            │                      │
│     │    tools: [...tool_specs...]           │                      │
│     │  }                                     │                      │
│     └────────────────────┬───────────────────┘                      │
│                          ▼                                           │
│  3. RESPONSE: Model streams response with tool calls                 │
│     ┌────────────────────────────────────────┐                      │
│     │  {                                     │                      │
│     │    type: "function_call",              │                      │
│     │    name: "shell",                      │                      │
│     │    arguments: "{\"command\":[...]}",   │                      │
│     │    call_id: "call_abc123"              │                      │
│     │  }                                     │                      │
│     └────────────────────┬───────────────────┘                      │
│                          ▼                                           │
│  4. EXECUTE: Parse response, execute tool, return result             │
│     ┌────────────────────────────────────────┐                      │
│     │  {                                     │                      │
│     │    type: "function_call_output",       │                      │
│     │    call_id: "call_abc123",             │                      │
│     │    output: "Exit code: 0\n..."         │                      │
│     │  }                                     │                      │
│     └────────────────────┬───────────────────┘                      │
│                          ▼                                           │
│  5. CONTINUE: Model processes result, may call more tools            │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Part 1: Instructing the Model About Tools

System Prompt

The system prompt establishes the agent's identity, capabilities, and behavioral guidelines. Here's how Codex does it:

You are a coding agent running in the Codex CLI, a terminal-based coding assistant.

Your capabilities:
- Receive user prompts and other context provided by the harness
- Communicate with the user by streaming thinking & responses
- Emit function calls to run terminal commands and apply patches

# Tool Guidelines

## Shell commands
When using the shell, you must adhere to the following guidelines:
- When searching for text or files, prefer using `rg` because it's faster
- Read files in chunks with a max chunk size of 250 lines

## `apply_patch`
Use the `apply_patch` shell command to edit files.
[...detailed patch format instructions...]

Key elements:

  1. Identity: Who the agent is
  2. Capabilities: What it can do
  3. Behavioral guidelines: How to behave (concise, precise, etc.)
  4. Tool-specific instructions: How to use each tool correctly

Tool Schema Definitions

Tools are defined as JSON schemas that the model receives with each request:

{
  "type": "function",
  "name": "shell",
  "description": "Runs a shell command and returns its output.",
  "parameters": {
    "type": "object",
    "properties": {
      "command": {
        "type": "array",
        "items": { "type": "string" },
        "description": "The command to execute"
      },
      "workdir": {
        "type": "string",
        "description": "The working directory"
      },
      "timeout_ms": {
        "type": "number",
        "description": "Timeout in milliseconds"
      }
    },
    "required": ["command"],
    "additionalProperties": false
  }
}

Key schema elements:

Building the Request

Codex constructs the API request with all components:

pub struct Prompt {
    /// Conversation history (user messages, assistant responses, tool outputs)
    pub input: Vec<ResponseItem>,

    /// Tool specifications
    pub tools: Vec<ToolSpec>,

    /// Whether parallel tool calls are permitted
    pub parallel_tool_calls: bool,

    /// Base instructions (system prompt)
    pub base_instructions_override: Option<String>,
}

impl Prompt {
    pub fn get_full_instructions(&self, model: &ModelFamily) -> Cow<str> {
        let base = self.base_instructions_override
            .as_deref()
            .unwrap_or(model.base_instructions.deref());
        
        // Add tool-specific instructions for models that need them
        if model.needs_special_apply_patch_instructions {
            Cow::Owned(format!("{base}\n{APPLY_PATCH_TOOL_INSTRUCTIONS}"))
        } else {
            Cow::Borrowed(base)
        }
    }
}

Sending to the API

The request is sent to the model API (OpenAI Responses API format):

pub struct ResponsesApiRequest<'a> {
    pub model: &'a str,
    pub instructions: &'a str,        // System prompt
    pub input: &'a [ResponseItem],    // Conversation history
    pub tools: &'a [serde_json::Value], // Tool definitions
    pub tool_choice: &'a str,         // "auto", "none", "required"
    pub parallel_tool_calls: bool,
    pub stream: bool,
}

Part 2: How the Model Calls Tools

Model's Decision Process

The model sees:

  1. Instructions: System prompt explaining its role and capabilities
  2. Tools: JSON schemas of available tools
  3. Conversation: User messages, previous assistant responses, tool results
  4. User request: Current user message

It then decides whether to:

Tool Call Output Format

When the model decides to call a tool, it outputs structured data:

{
  "type": "function_call",
  "id": "fc_abc123",
  "call_id": "call_xyz789",
  "name": "shell",
  "arguments": "{\"command\":[\"ls\",\"-la\"],\"workdir\":\"/home/user\"}"
}

Key fields:

Local Shell Call Format (OpenAI-specific)

OpenAI also supports a local_shell built-in tool type:

{
  "type": "local_shell_call",
  "id": "ls_123",
  "call_id": "call_456",
  "status": "in_progress",
  "action": {
    "type": "exec",
    "command": ["ls", "-la"],
    "working_directory": "/home/user",
    "timeout_ms": 30000
  }
}

Part 3: Parsing the Model's Response

Response Item Types

Codex defines an enum to handle all possible response types:

pub enum ResponseItem {
    // Text message from the model
    Message {
        id: Option<String>,
        role: String,
        content: Vec<ContentItem>,
    },
    
    // Model's reasoning/thinking (for reasoning models)
    Reasoning {
        id: String,
        summary: Vec<ReasoningItemReasoningSummary>,
        content: Option<Vec<ReasoningItemContent>>,
    },
    
    // Standard function call
    FunctionCall {
        id: Option<String>,
        name: String,
        arguments: String,  // JSON string
        call_id: String,
    },
    
    // OpenAI's local shell tool
    LocalShellCall {
        id: Option<String>,
        call_id: Option<String>,
        status: LocalShellStatus,
        action: LocalShellAction,
    },
    
    // Custom/freeform tool call
    CustomToolCall {
        id: Option<String>,
        status: Option<String>,
        call_id: String,
        name: String,
        input: String,
    },
    
    // Web search (another built-in)
    WebSearchCall {
        id: Option<String>,
        status: Option<String>,
        action: WebSearchAction,
    },
    
    // Tool output (when returning results to model)
    FunctionCallOutput {
        call_id: String,
        output: FunctionCallOutputPayload,
    },
    
    // And more...
}

Parsing Stream Events

The model response is streamed as Server-Sent Events (SSE). Each event is parsed:

pub async fn handle_output_item_done(
    ctx: &mut HandleOutputCtx,
    item: ResponseItem,
) -> Result<OutputItemResult> {
    let mut output = OutputItemResult::default();

    // Try to build a tool call from the response item
    match ToolRouter::build_tool_call(ctx.sess.as_ref(), item.clone()).await {
        
        // It's a tool call - queue execution
        Ok(Some(call)) => {
            tracing::info!("ToolCall: {} {}", call.tool_name, call.payload.log_payload());

            // Record in conversation history
            ctx.sess.record_conversation_items(&ctx.turn_context, &[&item]).await;

            // Create async task for tool execution
            let tool_future = Box::pin(async move {
                tool_runtime.handle_tool_call(call, cancellation_token).await
            });

            output.needs_follow_up = true;
            output.tool_future = Some(tool_future);
        }
        
        // Not a tool call - just a message
        Ok(None) => {
            ctx.sess.record_conversation_items(&ctx.turn_context, &[&item]).await;
            output.last_agent_message = last_assistant_message_from_item(&item);
        }
        
        // Error handling...
        Err(e) => { /* ... */ }
    }

    Ok(output)
}

Building Tool Calls from Response Items

The router converts response items into executable tool calls:

impl ToolRouter {
    pub async fn build_tool_call(
        session: &Session,
        item: ResponseItem,
    ) -> Result<Option<ToolCall>, FunctionCallError> {
        match item {
            // Standard function call
            ResponseItem::FunctionCall { name, arguments, call_id, .. } => {
                // Check if it's an MCP tool (prefixed with server name)
                if let Some((server, tool)) = session.parse_mcp_tool_name(&name).await {
                    Ok(Some(ToolCall {
                        tool_name: name,
                        call_id,
                        payload: ToolPayload::Mcp { server, tool, raw_arguments: arguments },
                    }))
                } else {
                    Ok(Some(ToolCall {
                        tool_name: name,
                        call_id,
                        payload: ToolPayload::Function { arguments },
                    }))
                }
            }
            
            // OpenAI's local shell
            ResponseItem::LocalShellCall { id, call_id, action, .. } => {
                let call_id = call_id.or(id)
                    .ok_or(FunctionCallError::MissingLocalShellCallId)?;

                match action {
                    LocalShellAction::Exec(exec) => {
                        let params = ShellToolCallParams {
                            command: exec.command,
                            workdir: exec.working_directory,
                            timeout_ms: exec.timeout_ms,
                            // ...
                        };
                        Ok(Some(ToolCall {
                            tool_name: "local_shell".to_string(),
                            call_id,
                            payload: ToolPayload::LocalShell { params },
                        }))
                    }
                }
            }
            
            // Custom tool call
            ResponseItem::CustomToolCall { name, input, call_id, .. } => {
                Ok(Some(ToolCall {
                    tool_name: name,
                    call_id,
                    payload: ToolPayload::Custom { input },
                }))
            }
            
            // Not a tool call
            _ => Ok(None),
        }
    }
}

Argument Parsing

Tool arguments come as a JSON string and need to be parsed:

impl ToolHandler for ShellHandler {
    async fn handle(&self, invocation: ToolInvocation) -> Result<ToolOutput, FunctionCallError> {
        match invocation.payload {
            ToolPayload::Function { arguments } => {
                // Parse JSON arguments into typed struct
                let params: ShellToolCallParams = serde_json::from_str(&arguments)
                    .map_err(|e| FunctionCallError::RespondToModel(
                        format!("failed to parse function arguments: {e:?}")
                    ))?;
                
                // Execute with parsed parameters
                self.run_shell(params, invocation.turn.as_ref()).await
            }
            
            ToolPayload::LocalShell { params } => {
                // Already parsed by build_tool_call
                self.run_shell(params, invocation.turn.as_ref()).await
            }
            
            _ => Err(FunctionCallError::RespondToModel("unsupported payload".into())),
        }
    }
}

Part 4: Executing Tools and Returning Results

Tool Execution

After parsing, the tool is executed:

// Tool handler executes the command
let output = handler.handle(invocation).await?;

// Convert to response format
let response = output.into_response(&call_id, &payload);

Formatting Results for the Model

Results must be formatted so the model can understand them:

pub enum ToolOutput {
    Function {
        content: String,           // Plain text result
        content_items: Option<Vec<ContentItem>>, // Structured content
        success: Option<bool>,
    },
    Mcp {
        result: Result<CallToolResult, String>,
    },
}

impl ToolOutput {
    pub fn into_response(self, call_id: &str, payload: &ToolPayload) -> ResponseInputItem {
        match self {
            ToolOutput::Function { content, content_items, success } => {
                ResponseInputItem::FunctionCallOutput {
                    call_id: call_id.to_string(),
                    output: FunctionCallOutputPayload {
                        content,
                        content_items,
                        success,
                    },
                }
            }
            // ...
        }
    }
}

Shell Output Formatting

For shell commands, output includes metadata:

pub fn format_exec_output_for_model(exec_output: &ExecToolCallOutput) -> String {
    let payload = ExecOutput {
        output: &formatted_output,
        metadata: ExecMetadata {
            exit_code: exec_output.exit_code,
            duration_seconds: exec_output.duration.as_secs_f32(),
        },
    };
    
    serde_json::to_string(&payload).expect("serialize")
}

Example output returned to model:

{
  "output": "src/\npackage.json\nREADME.md",
  "metadata": {
    "exit_code": 0,
    "duration_seconds": 0.1
  }
}

Part 5: Sub-Agents in Claude Code

How Sub-Agents Work

Claude Code's sub-agents are autonomous processes triggered by the main agent:

  1. Agent Definition: Markdown file with frontmatter
  2. Triggering: Main agent decides to use sub-agent based on description
  3. Execution: Sub-agent runs with its own system prompt and tools
  4. Result: Sub-agent completes and returns to main agent

Agent File Structure

---
name: code-reviewer
description: Reviews code for bugs, security vulnerabilities, and quality issues
tools: Glob, Grep, LS, Read, NotebookRead
model: sonnet
color: red
---

You are an expert code reviewer specializing in modern software development.

## Core Review Responsibilities

**Bug Detection**: Identify actual bugs - logic errors, null handling, race conditions...

**Code Quality**: Evaluate significant issues like code duplication, missing error handling...

## Output Guidance

For each issue, provide:
- Clear description with confidence score
- File path and line number
- Concrete fix suggestion

Triggering Mechanism

The main agent sees the agent description and decides when to use it:

description: Use this agent when user asks to "review code", "check for bugs",
"analyze my changes", or after completing a significant code change. Examples:

<example>
Context: User just implemented authentication
user: "I've added OAuth login"
assistant: "I'll use the code-reviewer agent to check the implementation."
<commentary>
Auth code written, trigger review for security and best practices.
</commentary>
</example>

Agent Invocation

When the main agent decides to use a sub-agent, it uses a special tool (Task tool in Claude Code):

assistant: "I'll use the code-reviewer agent to analyze your changes."
[Task tool invoked with agent-name and context]

The sub-agent then:

  1. Receives its own system prompt (the markdown body)
  2. Has access only to its specified tools
  3. Executes autonomously until complete
  4. Returns result to main agent

Part 6: Complete Request/Response Cycle

Full API Request Example

{
  "model": "gpt-5-codex",
  "instructions": "You are a coding agent running in the Codex CLI...",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "List files in the current directory"}]
    }
  ],
  "tools": [
    {
      "type": "function",
      "name": "shell",
      "description": "Runs a shell command...",
      "parameters": {
        "type": "object",
        "properties": {
          "command": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["command"]
      }
    }
  ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "stream": true
}

Streamed Response Events

event: response.output_item.added
data: {"type":"function_call","name":"shell","call_id":"call_123","arguments":""}

event: response.function_call_arguments.delta
data: {"delta":"{\"command\":[\"ls\",\"-la\"]}"}

event: response.output_item.done
data: {"type":"function_call","name":"shell","call_id":"call_123","arguments":"{\"command\":[\"ls\",\"-la\"]}"}

Tool Output Sent Back

{
  "type": "function_call_output",
  "call_id": "call_123",
  "output": "Exit code: 0\nWall time: 0.1 seconds\nOutput:\ntotal 24\ndrwxr-xr-x  5 user user  160 Jan 1 12:00 .\n..."
}

Summary

The tool calling flow involves:

  1. Instruction: System prompt + tool schemas tell the model what's available
  2. Decision: Model analyzes context and decides whether to call tools
  3. Output: Model outputs structured tool call with name and arguments
  4. Parsing: Agent parses the JSON, extracts tool name and arguments
  5. Routing: Router maps tool name to handler
  6. Execution: Handler runs the tool with parsed arguments
  7. Formatting: Output is formatted for model consumption
  8. Continuation: Model receives result and continues (or calls more tools)

For sub-agents (Claude Code):

This architecture enables powerful, extensible tool systems where the LLM becomes an intelligent orchestrator of capabilities.