This guide explains the complete flow of tool calling in LLM agents: how the model is instructed about available tools, how it decides to call them, and how its responses are parsed and executed.
┌──────────────────────────────────────────────────────────────────────┐
│ TOOL CALLING FLOW │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ 1. SETUP: Instruct the model about available tools │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ System Prompt│ │ Tool Schemas │ │ User Context │ │
│ │ (behavior) │ │ (JSON specs) │ │ (AGENTS.md) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────┘ │
│ ▼ │
│ 2. REQUEST: Send to model API with tool definitions │
│ ┌────────────────────────────────────────┐ │
│ │ POST /v1/responses │ │
│ │ { │ │
│ │ instructions: "...", │ │
│ │ input: [...messages...], │ │
│ │ tools: [...tool_specs...] │ │
│ │ } │ │
│ └────────────────────┬───────────────────┘ │
│ ▼ │
│ 3. RESPONSE: Model streams response with tool calls │
│ ┌────────────────────────────────────────┐ │
│ │ { │ │
│ │ type: "function_call", │ │
│ │ name: "shell", │ │
│ │ arguments: "{\"command\":[...]}", │ │
│ │ call_id: "call_abc123" │ │
│ │ } │ │
│ └────────────────────┬───────────────────┘ │
│ ▼ │
│ 4. EXECUTE: Parse response, execute tool, return result │
│ ┌────────────────────────────────────────┐ │
│ │ { │ │
│ │ type: "function_call_output", │ │
│ │ call_id: "call_abc123", │ │
│ │ output: "Exit code: 0\n..." │ │
│ │ } │ │
│ └────────────────────┬───────────────────┘ │
│ ▼ │
│ 5. CONTINUE: Model processes result, may call more tools │
│ │
└──────────────────────────────────────────────────────────────────────┘
The system prompt establishes the agent's identity, capabilities, and behavioral guidelines. Here's how Codex does it:
You are a coding agent running in the Codex CLI, a terminal-based coding assistant.
Your capabilities:
- Receive user prompts and other context provided by the harness
- Communicate with the user by streaming thinking & responses
- Emit function calls to run terminal commands and apply patches
# Tool Guidelines
## Shell commands
When using the shell, you must adhere to the following guidelines:
- When searching for text or files, prefer using `rg` because it's faster
- Read files in chunks with a max chunk size of 250 lines
## `apply_patch`
Use the `apply_patch` shell command to edit files.
[...detailed patch format instructions...]
Key elements:
Tools are defined as JSON schemas that the model receives with each request:
{
"type": "function",
"name": "shell",
"description": "Runs a shell command and returns its output.",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "array",
"items": { "type": "string" },
"description": "The command to execute"
},
"workdir": {
"type": "string",
"description": "The working directory"
},
"timeout_ms": {
"type": "number",
"description": "Timeout in milliseconds"
}
},
"required": ["command"],
"additionalProperties": false
}
}
Key schema elements:
name: Tool identifier the model uses to invoke itdescription: Explains what the tool does and when to use itparameters: JSON Schema defining expected argumentsrequired: Which parameters must be providedCodex constructs the API request with all components:
pub struct Prompt {
/// Conversation history (user messages, assistant responses, tool outputs)
pub input: Vec<ResponseItem>,
/// Tool specifications
pub tools: Vec<ToolSpec>,
/// Whether parallel tool calls are permitted
pub parallel_tool_calls: bool,
/// Base instructions (system prompt)
pub base_instructions_override: Option<String>,
}
impl Prompt {
pub fn get_full_instructions(&self, model: &ModelFamily) -> Cow<str> {
let base = self.base_instructions_override
.as_deref()
.unwrap_or(model.base_instructions.deref());
// Add tool-specific instructions for models that need them
if model.needs_special_apply_patch_instructions {
Cow::Owned(format!("{base}\n{APPLY_PATCH_TOOL_INSTRUCTIONS}"))
} else {
Cow::Borrowed(base)
}
}
}
The request is sent to the model API (OpenAI Responses API format):
pub struct ResponsesApiRequest<'a> {
pub model: &'a str,
pub instructions: &'a str, // System prompt
pub input: &'a [ResponseItem], // Conversation history
pub tools: &'a [serde_json::Value], // Tool definitions
pub tool_choice: &'a str, // "auto", "none", "required"
pub parallel_tool_calls: bool,
pub stream: bool,
}
The model sees:
It then decides whether to:
When the model decides to call a tool, it outputs structured data:
{
"type": "function_call",
"id": "fc_abc123",
"call_id": "call_xyz789",
"name": "shell",
"arguments": "{\"command\":[\"ls\",\"-la\"],\"workdir\":\"/home/user\"}"
}
Key fields:
type: Identifies this as a tool callname: Which tool to invokearguments: JSON string of parameterscall_id: Unique identifier to match with outputOpenAI also supports a local_shell built-in tool type:
{
"type": "local_shell_call",
"id": "ls_123",
"call_id": "call_456",
"status": "in_progress",
"action": {
"type": "exec",
"command": ["ls", "-la"],
"working_directory": "/home/user",
"timeout_ms": 30000
}
}
Codex defines an enum to handle all possible response types:
pub enum ResponseItem {
// Text message from the model
Message {
id: Option<String>,
role: String,
content: Vec<ContentItem>,
},
// Model's reasoning/thinking (for reasoning models)
Reasoning {
id: String,
summary: Vec<ReasoningItemReasoningSummary>,
content: Option<Vec<ReasoningItemContent>>,
},
// Standard function call
FunctionCall {
id: Option<String>,
name: String,
arguments: String, // JSON string
call_id: String,
},
// OpenAI's local shell tool
LocalShellCall {
id: Option<String>,
call_id: Option<String>,
status: LocalShellStatus,
action: LocalShellAction,
},
// Custom/freeform tool call
CustomToolCall {
id: Option<String>,
status: Option<String>,
call_id: String,
name: String,
input: String,
},
// Web search (another built-in)
WebSearchCall {
id: Option<String>,
status: Option<String>,
action: WebSearchAction,
},
// Tool output (when returning results to model)
FunctionCallOutput {
call_id: String,
output: FunctionCallOutputPayload,
},
// And more...
}
The model response is streamed as Server-Sent Events (SSE). Each event is parsed:
pub async fn handle_output_item_done(
ctx: &mut HandleOutputCtx,
item: ResponseItem,
) -> Result<OutputItemResult> {
let mut output = OutputItemResult::default();
// Try to build a tool call from the response item
match ToolRouter::build_tool_call(ctx.sess.as_ref(), item.clone()).await {
// It's a tool call - queue execution
Ok(Some(call)) => {
tracing::info!("ToolCall: {} {}", call.tool_name, call.payload.log_payload());
// Record in conversation history
ctx.sess.record_conversation_items(&ctx.turn_context, &[&item]).await;
// Create async task for tool execution
let tool_future = Box::pin(async move {
tool_runtime.handle_tool_call(call, cancellation_token).await
});
output.needs_follow_up = true;
output.tool_future = Some(tool_future);
}
// Not a tool call - just a message
Ok(None) => {
ctx.sess.record_conversation_items(&ctx.turn_context, &[&item]).await;
output.last_agent_message = last_assistant_message_from_item(&item);
}
// Error handling...
Err(e) => { /* ... */ }
}
Ok(output)
}
The router converts response items into executable tool calls:
impl ToolRouter {
pub async fn build_tool_call(
session: &Session,
item: ResponseItem,
) -> Result<Option<ToolCall>, FunctionCallError> {
match item {
// Standard function call
ResponseItem::FunctionCall { name, arguments, call_id, .. } => {
// Check if it's an MCP tool (prefixed with server name)
if let Some((server, tool)) = session.parse_mcp_tool_name(&name).await {
Ok(Some(ToolCall {
tool_name: name,
call_id,
payload: ToolPayload::Mcp { server, tool, raw_arguments: arguments },
}))
} else {
Ok(Some(ToolCall {
tool_name: name,
call_id,
payload: ToolPayload::Function { arguments },
}))
}
}
// OpenAI's local shell
ResponseItem::LocalShellCall { id, call_id, action, .. } => {
let call_id = call_id.or(id)
.ok_or(FunctionCallError::MissingLocalShellCallId)?;
match action {
LocalShellAction::Exec(exec) => {
let params = ShellToolCallParams {
command: exec.command,
workdir: exec.working_directory,
timeout_ms: exec.timeout_ms,
// ...
};
Ok(Some(ToolCall {
tool_name: "local_shell".to_string(),
call_id,
payload: ToolPayload::LocalShell { params },
}))
}
}
}
// Custom tool call
ResponseItem::CustomToolCall { name, input, call_id, .. } => {
Ok(Some(ToolCall {
tool_name: name,
call_id,
payload: ToolPayload::Custom { input },
}))
}
// Not a tool call
_ => Ok(None),
}
}
}
Tool arguments come as a JSON string and need to be parsed:
impl ToolHandler for ShellHandler {
async fn handle(&self, invocation: ToolInvocation) -> Result<ToolOutput, FunctionCallError> {
match invocation.payload {
ToolPayload::Function { arguments } => {
// Parse JSON arguments into typed struct
let params: ShellToolCallParams = serde_json::from_str(&arguments)
.map_err(|e| FunctionCallError::RespondToModel(
format!("failed to parse function arguments: {e:?}")
))?;
// Execute with parsed parameters
self.run_shell(params, invocation.turn.as_ref()).await
}
ToolPayload::LocalShell { params } => {
// Already parsed by build_tool_call
self.run_shell(params, invocation.turn.as_ref()).await
}
_ => Err(FunctionCallError::RespondToModel("unsupported payload".into())),
}
}
}
After parsing, the tool is executed:
// Tool handler executes the command
let output = handler.handle(invocation).await?;
// Convert to response format
let response = output.into_response(&call_id, &payload);
Results must be formatted so the model can understand them:
pub enum ToolOutput {
Function {
content: String, // Plain text result
content_items: Option<Vec<ContentItem>>, // Structured content
success: Option<bool>,
},
Mcp {
result: Result<CallToolResult, String>,
},
}
impl ToolOutput {
pub fn into_response(self, call_id: &str, payload: &ToolPayload) -> ResponseInputItem {
match self {
ToolOutput::Function { content, content_items, success } => {
ResponseInputItem::FunctionCallOutput {
call_id: call_id.to_string(),
output: FunctionCallOutputPayload {
content,
content_items,
success,
},
}
}
// ...
}
}
}
For shell commands, output includes metadata:
pub fn format_exec_output_for_model(exec_output: &ExecToolCallOutput) -> String {
let payload = ExecOutput {
output: &formatted_output,
metadata: ExecMetadata {
exit_code: exec_output.exit_code,
duration_seconds: exec_output.duration.as_secs_f32(),
},
};
serde_json::to_string(&payload).expect("serialize")
}
Example output returned to model:
{
"output": "src/\npackage.json\nREADME.md",
"metadata": {
"exit_code": 0,
"duration_seconds": 0.1
}
}
Claude Code's sub-agents are autonomous processes triggered by the main agent:
---
name: code-reviewer
description: Reviews code for bugs, security vulnerabilities, and quality issues
tools: Glob, Grep, LS, Read, NotebookRead
model: sonnet
color: red
---
You are an expert code reviewer specializing in modern software development.
## Core Review Responsibilities
**Bug Detection**: Identify actual bugs - logic errors, null handling, race conditions...
**Code Quality**: Evaluate significant issues like code duplication, missing error handling...
## Output Guidance
For each issue, provide:
- Clear description with confidence score
- File path and line number
- Concrete fix suggestion
The main agent sees the agent description and decides when to use it:
description: Use this agent when user asks to "review code", "check for bugs",
"analyze my changes", or after completing a significant code change. Examples:
<example>
Context: User just implemented authentication
user: "I've added OAuth login"
assistant: "I'll use the code-reviewer agent to check the implementation."
<commentary>
Auth code written, trigger review for security and best practices.
</commentary>
</example>
When the main agent decides to use a sub-agent, it uses a special tool (Task tool in Claude Code):
assistant: "I'll use the code-reviewer agent to analyze your changes."
[Task tool invoked with agent-name and context]
The sub-agent then:
{
"model": "gpt-5-codex",
"instructions": "You are a coding agent running in the Codex CLI...",
"input": [
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "List files in the current directory"}]
}
],
"tools": [
{
"type": "function",
"name": "shell",
"description": "Runs a shell command...",
"parameters": {
"type": "object",
"properties": {
"command": {"type": "array", "items": {"type": "string"}}
},
"required": ["command"]
}
}
],
"tool_choice": "auto",
"parallel_tool_calls": true,
"stream": true
}
event: response.output_item.added
data: {"type":"function_call","name":"shell","call_id":"call_123","arguments":""}
event: response.function_call_arguments.delta
data: {"delta":"{\"command\":[\"ls\",\"-la\"]}"}
event: response.output_item.done
data: {"type":"function_call","name":"shell","call_id":"call_123","arguments":"{\"command\":[\"ls\",\"-la\"]}"}
{
"type": "function_call_output",
"call_id": "call_123",
"output": "Exit code: 0\nWall time: 0.1 seconds\nOutput:\ntotal 24\ndrwxr-xr-x 5 user user 160 Jan 1 12:00 .\n..."
}
The tool calling flow involves:
For sub-agents (Claude Code):
This architecture enables powerful, extensible tool systems where the LLM becomes an intelligent orchestrator of capabilities.