How LLMs Are Instructed to Call Tools and How Responses Are Parsed

This guide explains the complete flow of tool calling in LLM agents: how the model is instructed about available tools, how it decides to call them, and how its responses are parsed and executed.

Overview: The Tool Calling Flow

┌──────────────────────────────────────────────────────────────────────┐
│                        TOOL CALLING FLOW                             │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. SETUP: Instruct the model about available tools                  │
│     ┌──────────────┐  ┌──────────────┐  ┌──────────────┐            │
│     │ System Prompt│  │ Tool Schemas │  │ User Context │            │
│     │ (behavior)   │  │ (JSON specs) │  │ (AGENTS.md)  │            │
│     └──────┬───────┘  └──────┬───────┘  └──────┬───────┘            │
│            │                 │                 │                     │
│            └─────────────────┼─────────────────┘                     │
│                              ▼                                       │
│  2. REQUEST: Send to model API with tool definitions                 │
│     ┌────────────────────────────────────────┐                      │
│     │  POST /v1/responses                    │                      │
│     │  {                                     │                      │
│     │    instructions: "...",                │                      │
│     │    input: [...messages...],            │                      │
│     │    tools: [...tool_specs...]           │                      │
│     │  }                                     │                      │
│     └────────────────────┬───────────────────┘                      │
│                          ▼                                           │
│  3. RESPONSE: Model streams response with tool calls                 │
│     ┌────────────────────────────────────────┐                      │
│     │  {                                     │                      │
│     │    type: "function_call",              │                      │
│     │    name: "shell",                      │                      │
│     │    arguments: "{\"command\":[...]}",   │                      │
│     │    call_id: "call_abc123"              │                      │
│     │  }                                     │                      │
│     └────────────────────┬───────────────────┘                      │
│                          ▼                                           │
│  4. EXECUTE: Parse response, execute tool, return result             │
│     ┌────────────────────────────────────────┐                      │
│     │  {                                     │                      │
│     │    type: "function_call_output",       │                      │
│     │    call_id: "call_abc123",             │                      │
│     │    output: "Exit code: 0\n..."         │                      │
│     │  }                                     │                      │
│     └────────────────────┬───────────────────┘                      │
│                          ▼                                           │
│  5. CONTINUE: Model processes result, may call more tools            │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Part 1: Instructing the Model About Tools

System Prompt

The system prompt establishes the agent's identity, capabilities, and behavioral guidelines. Here's how Codex does it:

You are a coding agent running in the Codex CLI, a terminal-based coding assistant.

Your capabilities:
- Receive user prompts and other context provided by the harness
- Communicate with the user by streaming thinking & responses
- Emit function calls to run terminal commands and apply patches

# Tool Guidelines

## Shell commands
When using the shell, you must adhere to the following guidelines:
- When searching for text or files, prefer using `rg` because it's faster
- Read files in chunks with a max chunk size of 250 lines

## `apply_patch`
Use the `apply_patch` shell command to edit files.
[...detailed patch format instructions...]

Key elements:

Identity: Who the agent is
Capabilities: What it can do
Behavioral guidelines: How to behave (concise, precise, etc.)
Tool-specific instructions: How to use each tool correctly

Tool Schema Definitions

Tools are defined as JSON schemas that the model receives with each request:

{
  "type": "function",
  "name": "shell",
  "description": "Runs a shell command and returns its output.",
  "parameters": {
    "type": "object",
    "properties": {
      "command": {
        "type": "array",
        "items": { "type": "string" },
        "description": "The command to execute"
      },
      "workdir": {
        "type": "string",
        "description": "The working directory"
      },
      "timeout_ms": {
        "type": "number",
        "description": "Timeout in milliseconds"
      }
    },
    "required": ["command"],
    "additionalProperties": false
  }
}

Key schema elements:

name: Tool identifier the model uses to invoke it
description: Explains what the tool does and when to use it
parameters: JSON Schema defining expected arguments
required: Which parameters must be provided

Building the Request

Codex constructs the API request with all components:

pub struct Prompt {
    /// Conversation history (user messages, assistant responses, tool outputs)
    pub input: Vec<ResponseItem>,

    /// Tool specifications
    pub tools: Vec<ToolSpec>,

    /// Whether parallel tool calls are permitted
    pub parallel_tool_calls: bool,

    /// Base instructions (system prompt)
    pub base_instructions_override: Option<String>,
}

impl Prompt {
    pub fn get_full_instructions(&self, model: &ModelFamily) -> Cow<str> {
        let base = self.base_instructions_override
            .as_deref()
            .unwrap_or(model.base_instructions.deref());
        
        // Add tool-specific instructions for models that need them
        if model.needs_special_apply_patch_instructions {
            Cow::Owned(format!("{base}\n{APPLY_PATCH_TOOL_INSTRUCTIONS}"))
        } else {
            Cow::Borrowed(base)
        }
    }
}

Sending to the API

The request is sent to the model API (OpenAI Responses API format):

pub struct ResponsesApiRequest<'a> {
    pub model: &'a str,
    pub instructions: &'a str,        // System prompt
    pub input: &'a [ResponseItem],    // Conversation history
    pub tools: &'a [serde_json::Value], // Tool definitions
    pub tool_choice: &'a str,         // "auto", "none", "required"
    pub parallel_tool_calls: bool,
    pub stream: bool,
}

Part 2: How the Model Calls Tools

Model's Decision Process

The model sees:

Instructions: System prompt explaining its role and capabilities
Tools: JSON schemas of available tools
Conversation: User messages, previous assistant responses, tool results
User request: Current user message

It then decides whether to:

Respond with text
Call one or more tools
Both respond AND call tools

Tool Call Output Format

When the model decides to call a tool, it outputs structured data:

{
  "type": "function_call",
  "id": "fc_abc123",
  "call_id": "call_xyz789",
  "name": "shell",
  "arguments": "{\"command\":[\"ls\",\"-la\"],\"workdir\":\"/home/user\"}"
}

Key fields:

type: Identifies this as a tool call
name: Which tool to invoke
arguments: JSON string of parameters
call_id: Unique identifier to match with output

Local Shell Call Format (OpenAI-specific)

OpenAI also supports a local_shell built-in tool type:

{
  "type": "local_shell_call",
  "id": "ls_123",
  "call_id": "call_456",
  "status": "in_progress",
  "action": {
    "type": "exec",
    "command": ["ls", "-la"],
    "working_directory": "/home/user",
    "timeout_ms": 30000
  }
}

Part 3: Parsing the Model's Response

Response Item Types

Codex defines an enum to handle all possible response types:

pub enum ResponseItem {
    // Text message from the model
    Message {
        id: Option<String>,
        role: String,
        content: Vec<ContentItem>,
    },
    
    // Model's reasoning/thinking (for reasoning models)
    Reasoning {
        id: String,
        summary: Vec<ReasoningItemReasoningSummary>,
        content: Option<Vec<ReasoningItemContent>>,
    },
    
    // Standard function call
    FunctionCall {
        id: Option<String>,
        name: String,
        arguments: String,  // JSON string
        call_id: String,
    },
    
    // OpenAI's local shell tool
    LocalShellCall {
        id: Option<String>,
        call_id: Option<String>,
        status: LocalShellStatus,
        action: LocalShellAction,
    },
    
    // Custom/freeform tool call
    CustomToolCall {
        id: Option<String>,
        status: Option<String>,
        call_id: String,
        name: String,
        input: String,
    },
    
    // Web search (another built-in)
    WebSearchCall {
        id: Option<String>,
        status: Option<String>,
        action: WebSearchAction,
    },
    
    // Tool output (when returning results to model)
    FunctionCallOutput {
        call_id: String,
        output: FunctionCallOutputPayload,
    },
    
    // And more...
}

Parsing Stream Events

The model response is streamed as Server-Sent Events (SSE). Each event is parsed:

pub async fn handle_output_item_done(
    ctx: &mut HandleOutputCtx,
    item: ResponseItem,
) -> Result<OutputItemResult> {
    let mut output = OutputItemResult::default();

    // Try to build a tool call from the response item
    match ToolRouter::build_tool_call(ctx.sess.as_ref(), item.clone()).await {
        
        // It's a tool call - queue execution
        Ok(Some(call)) => {
            tracing::info!("ToolCall: {} {}", call.tool_name, call.payload.log_payload());

            // Record in conversation history
            ctx.sess.record_conversation_items(&ctx.turn_context, &[&item]).await;

            // Create async task for tool execution
            let tool_future = Box::pin(async move {
                tool_runtime.handle_tool_call(call, cancellation_token).await
            });

            output.needs_follow_up = true;
            output.tool_future = Some(tool_future);
        }
        
        // Not a tool call - just a message
        Ok(None) => {
            ctx.sess.record_conversation_items(&ctx.turn_context, &[&item]).await;
            output.last_agent_message = last_assistant_message_from_item(&item);
        }
        
        // Error handling...
        Err(e) => { /* ... */ }
    }

    Ok(output)
}

Building Tool Calls from Response Items

The router converts response items into executable tool calls:

impl ToolRouter {
    pub async fn build_tool_call(
        session: &Session,
        item: ResponseItem,
    ) -> Result<Option<ToolCall>, FunctionCallError> {
        match item {
            // Standard function call
            ResponseItem::FunctionCall { name, arguments, call_id, .. } => {
                // Check if it's an MCP tool (prefixed with server name)
                if let Some((server, tool)) = session.parse_mcp_tool_name(&name).await {
                    Ok(Some(ToolCall {
                        tool_name: name,
                        call_id,
                        payload: ToolPayload::Mcp { server, tool, raw_arguments: arguments },
                    }))
                } else {
                    Ok(Some(ToolCall {
                        tool_name: name,
                        call_id,
                        payload: ToolPayload::Function { arguments },
                    }))
                }
            }
            
            // OpenAI's local shell
            ResponseItem::LocalShellCall { id, call_id, action, .. } => {
                let call_id = call_id.or(id)
                    .ok_or(FunctionCallError::MissingLocalShellCallId)?;

                match action {
                    LocalShellAction::Exec(exec) => {
                        let params = ShellToolCallParams {
                            command: exec.command,
                            workdir: exec.working_directory,
                            timeout_ms: exec.timeout_ms,
                            // ...
                        };
                        Ok(Some(ToolCall {
                            tool_name: "local_shell".to_string(),
                            call_id,
                            payload: ToolPayload::LocalShell { params },
                        }))
                    }
                }
            }
            
            // Custom tool call
            ResponseItem::CustomToolCall { name, input, call_id, .. } => {
                Ok(Some(ToolCall {
                    tool_name: name,
                    call_id,
                    payload: ToolPayload::Custom { input },
                }))
            }
            
            // Not a tool call
            _ => Ok(None),
        }
    }
}

Argument Parsing

Tool arguments come as a JSON string and need to be parsed:

impl ToolHandler for ShellHandler {
    async fn handle(&self, invocation: ToolInvocation) -> Result<ToolOutput, FunctionCallError> {
        match invocation.payload {
            ToolPayload::Function { arguments } => {
                // Parse JSON arguments into typed struct
                let params: ShellToolCallParams = serde_json::from_str(&arguments)
                    .map_err(|e| FunctionCallError::RespondToModel(
                        format!("failed to parse function arguments: {e:?}")
                    ))?;
                
                // Execute with parsed parameters
                self.run_shell(params, invocation.turn.as_ref()).await
            }
            
            ToolPayload::LocalShell { params } => {
                // Already parsed by build_tool_call
                self.run_shell(params, invocation.turn.as_ref()).await
            }
            
            _ => Err(FunctionCallError::RespondToModel("unsupported payload".into())),
        }
    }
}

Part 4: Executing Tools and Returning Results

Tool Execution

After parsing, the tool is executed:

// Tool handler executes the command
let output = handler.handle(invocation).await?;

// Convert to response format
let response = output.into_response(&call_id, &payload);

Formatting Results for the Model

Results must be formatted so the model can understand them:

pub enum ToolOutput {
    Function {
        content: String,           // Plain text result
        content_items: Option<Vec<ContentItem>>, // Structured content
        success: Option<bool>,
    },
    Mcp {
        result: Result<CallToolResult, String>,
    },
}

impl ToolOutput {
    pub fn into_response(self, call_id: &str, payload: &ToolPayload) -> ResponseInputItem {
        match self {
            ToolOutput::Function { content, content_items, success } => {
                ResponseInputItem::FunctionCallOutput {
                    call_id: call_id.to_string(),
                    output: FunctionCallOutputPayload {
                        content,
                        content_items,
                        success,
                    },
                }
            }
            // ...
        }
    }
}

Shell Output Formatting

For shell commands, output includes metadata:

pub fn format_exec_output_for_model(exec_output: &ExecToolCallOutput) -> String {
    let payload = ExecOutput {
        output: &formatted_output,
        metadata: ExecMetadata {
            exit_code: exec_output.exit_code,
            duration_seconds: exec_output.duration.as_secs_f32(),
        },
    };
    
    serde_json::to_string(&payload).expect("serialize")
}

Example output returned to model:

{
  "output": "src/\npackage.json\nREADME.md",
  "metadata": {
    "exit_code": 0,
    "duration_seconds": 0.1
  }
}

Part 5: Sub-Agents in Claude Code

How Sub-Agents Work

Claude Code's sub-agents are autonomous processes triggered by the main agent:

Agent Definition: Markdown file with frontmatter
Triggering: Main agent decides to use sub-agent based on description
Execution: Sub-agent runs with its own system prompt and tools
Result: Sub-agent completes and returns to main agent

Agent File Structure

---
name: code-reviewer
description: Reviews code for bugs, security vulnerabilities, and quality issues
tools: Glob, Grep, LS, Read, NotebookRead
model: sonnet
color: red
---

You are an expert code reviewer specializing in modern software development.

## Core Review Responsibilities

**Bug Detection**: Identify actual bugs - logic errors, null handling, race conditions...

**Code Quality**: Evaluate significant issues like code duplication, missing error handling...

## Output Guidance

For each issue, provide:
- Clear description with confidence score
- File path and line number
- Concrete fix suggestion

Triggering Mechanism

The main agent sees the agent description and decides when to use it:

description: Use this agent when user asks to "review code", "check for bugs",
"analyze my changes", or after completing a significant code change. Examples:

<example>
Context: User just implemented authentication
user: "I've added OAuth login"
assistant: "I'll use the code-reviewer agent to check the implementation."
<commentary>
Auth code written, trigger review for security and best practices.
</commentary>
</example>

Agent Invocation

When the main agent decides to use a sub-agent, it uses a special tool (Task tool in Claude Code):

assistant: "I'll use the code-reviewer agent to analyze your changes."
[Task tool invoked with agent-name and context]

The sub-agent then:

Receives its own system prompt (the markdown body)
Has access only to its specified tools
Executes autonomously until complete
Returns result to main agent

Part 6: Complete Request/Response Cycle

Full API Request Example

{
  "model": "gpt-5-codex",
  "instructions": "You are a coding agent running in the Codex CLI...",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "List files in the current directory"}]
    }
  ],
  "tools": [
    {
      "type": "function",
      "name": "shell",
      "description": "Runs a shell command...",
      "parameters": {
        "type": "object",
        "properties": {
          "command": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["command"]
      }
    }
  ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "stream": true
}

Streamed Response Events

event: response.output_item.added
data: {"type":"function_call","name":"shell","call_id":"call_123","arguments":""}

event: response.function_call_arguments.delta
data: {"delta":"{\"command\":[\"ls\",\"-la\"]}"}

event: response.output_item.done
data: {"type":"function_call","name":"shell","call_id":"call_123","arguments":"{\"command\":[\"ls\",\"-la\"]}"}

Tool Output Sent Back

{
  "type": "function_call_output",
  "call_id": "call_123",
  "output": "Exit code: 0\nWall time: 0.1 seconds\nOutput:\ntotal 24\ndrwxr-xr-x  5 user user  160 Jan 1 12:00 .\n..."
}

Summary

The tool calling flow involves:

Instruction: System prompt + tool schemas tell the model what's available
Decision: Model analyzes context and decides whether to call tools
Output: Model outputs structured tool call with name and arguments
Parsing: Agent parses the JSON, extracts tool name and arguments
Routing: Router maps tool name to handler
Execution: Handler runs the tool with parsed arguments
Formatting: Output is formatted for model consumption
Continuation: Model receives result and continues (or calls more tools)

For sub-agents (Claude Code):

Agents are defined in markdown with triggering conditions
Main agent decides when to invoke based on description/examples
Sub-agent runs with its own system prompt and limited tools
Results return to main agent for continuation

This architecture enables powerful, extensible tool systems where the LLM becomes an intelligent orchestrator of capabilities.