This guide explains how LLM agents gather, structure, manage, and optimize context for effective model interactions. Context is the lifeblood of an agent—the model can only reason about what it sees.
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONTEXT PIPELINE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 1. STATIC CONTEXT │ │
│ │ (loaded once at session start) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
│ │ │ System Prompt│ │ AGENTS.md │ │ Skills/Capabilities │ │ │
│ │ │ (behavior) │ │ (project) │ │ (available helpers) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 2. ENVIRONMENT CONTEXT │ │
│ │ (injected per-turn) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
│ │ │ CWD, Shell │ │ Permissions │ │ Sandbox Mode │ │ │
│ │ │ │ │ (approval) │ │ (read-only, write, etc.) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 3. CONVERSATION HISTORY │ │
│ │ (accumulated over session) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
│ │ │ User Messages│ │ Assistant │ │ Tool Calls & Outputs │ │ │
│ │ │ │ │ Responses │ │ │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 4. CONTEXT MANAGEMENT │ │
│ │ (keep within context window) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │
│ │ │ Truncation │ │ Compaction │ │ Token Tracking │ │ │
│ │ │ │ │ (summarize) │ │ │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The system prompt establishes the agent's identity and capabilities. It's loaded once at session start:
// From Codex's model_family.rs
pub struct ModelFamily {
/// Base system prompt for this model family
pub base_instructions: Arc<str>,
/// Whether this model needs special tool instructions
pub needs_special_apply_patch_instructions: bool,
/// Default truncation policy for tool outputs
pub truncation_policy: TruncationPolicy,
}
// The prompt is loaded from markdown files
pub const BASE_PROMPT: &str = include_str!("../prompt.md");
The system prompt typically includes:
Codex discovers project-specific instructions by walking up the directory tree:
/// Discovery algorithm for AGENTS.md files
pub fn discover_project_doc_paths(config: &Config) -> std::io::Result<Vec<PathBuf>> {
let mut dir = config.cwd.clone();
// Build chain from cwd upwards and detect git root
let mut chain: Vec<PathBuf> = vec![dir.clone()];
let mut git_root: Option<PathBuf> = None;
while let Some(parent) = dir.parent() {
// Check for .git marker
let git_marker = dir.join(".git");
if git_marker.exists() {
git_root = Some(dir.clone());
break;
}
chain.push(parent.to_path_buf());
dir = parent.to_path_buf();
}
// Search from git root down to cwd
let search_dirs = if let Some(root) = git_root {
chain.iter()
.rev()
.skip_while(|p| *p != &root)
.cloned()
.collect()
} else {
vec![config.cwd.clone()]
};
// Look for AGENTS.md or fallbacks in each directory
let mut found = Vec::new();
let candidates = ["AGENTS.override.md", "AGENTS.md"];
for dir in search_dirs {
for name in &candidates {
let path = dir.join(name);
if path.is_file() {
found.push(path);
break; // Only one per directory
}
}
}
Ok(found)
}
Discovery rules:
.git marker)AGENTS.md from root down to cwdAGENTS.override.md takes precedence in each directoryCodex supports "skills" - reusable instruction bundles discovered at startup:
pub fn render_skills_section(skills: &[SkillMetadata]) -> Option<String> {
if skills.is_empty() {
return None;
}
let mut lines = Vec::new();
lines.push("## Skills".to_string());
lines.push("These skills are discovered at startup from ~/.codex/skills...".to_string());
for skill in skills {
lines.push(format!(
"- {}: {} (file: {})",
skill.name, skill.description, skill.path.display()
));
}
// Add usage rules
lines.push(SKILL_USAGE_RULES.to_string());
Some(lines.join("\n"))
}
Skills are referenced but not inlined - the model is told where to find them and loads details on-demand to keep context lean.
All static context is merged together:
pub async fn get_user_instructions(config: &Config) -> Option<String> {
// Load skills (names + descriptions, not full content)
let skills_section = if config.features.enabled(Feature::Skills) {
let skills = load_skills(config);
render_skills_section(&skills.skills)
} else {
None
};
// Load project docs (AGENTS.md files)
let project_docs = read_project_docs(config).await?;
// Merge project docs with skills
let combined = merge_project_docs_with_skills(project_docs, skills_section);
// Combine with any base user instructions
let mut parts = Vec::new();
if let Some(instructions) = config.user_instructions.clone() {
parts.push(instructions);
}
if let Some(project_doc) = combined {
if !parts.is_empty() {
parts.push("\n\n--- project-doc ---\n\n".to_string());
}
parts.push(project_doc);
}
if parts.is_empty() { None } else { Some(parts.concat()) }
}
Each turn includes environment information so the model knows its current state:
pub struct EnvironmentContext {
pub cwd: Option<PathBuf>, // Current working directory
pub approval_policy: Option<AskForApproval>, // Permission level
pub sandbox_mode: Option<SandboxMode>, // Sandbox restrictions
pub network_access: Option<NetworkAccess>, // Network availability
pub writable_roots: Option<Vec<PathBuf>>, // Allowed write paths
pub shell: Shell, // Shell type (bash, zsh, etc.)
}
impl EnvironmentContext {
/// Serialize to XML for model consumption
pub fn serialize_to_xml(self) -> String {
let mut lines = vec!["<environment_context>".to_string()];
if let Some(cwd) = self.cwd {
lines.push(format!(" <cwd>{}</cwd>", cwd.display()));
}
if let Some(policy) = self.approval_policy {
lines.push(format!(" <approval_policy>{}</approval_policy>", policy));
}
if let Some(mode) = self.sandbox_mode {
lines.push(format!(" <sandbox_mode>{}</sandbox_mode>", mode));
}
// ... more fields
lines.push("</environment_context>".to_string());
lines.join("\n")
}
}
Example environment context sent to model:
<environment_context>
<cwd>/home/user/my-project</cwd>
<approval_policy>on-request</approval_policy>
<sandbox_mode>workspace-write</sandbox_mode>
<network_access>restricted</network_access>
<writable_roots>
<root>/home/user/my-project</root>
<root>/tmp</root>
</writable_roots>
<shell>bash</shell>
</environment_context>
Project-specific instructions are formatted specially:
impl From<UserInstructions> for ResponseItem {
fn from(ui: UserInstructions) -> Self {
ResponseItem::Message {
role: "user".to_string(),
content: vec![ContentItem::InputText {
text: format!(
"# AGENTS.md instructions for {directory}\n\n<INSTRUCTIONS>\n{contents}\n</INSTRUCTIONS>",
directory = ui.directory,
contents = ui.text
),
}],
}
}
}
The context manager tracks the full conversation:
pub struct ContextManager {
/// Items ordered from oldest to newest
items: Vec<ResponseItem>,
/// Token usage information
token_info: Option<TokenUsageInfo>,
}
impl ContextManager {
/// Record new items into history
pub fn record_items<I>(&mut self, items: I, policy: TruncationPolicy)
where
I: IntoIterator<Item = ResponseItem>,
{
for item in items {
// Skip non-API items
if !is_api_message(&item) {
continue;
}
// Process (potentially truncate) the item
let processed = self.process_item(&item, policy);
self.items.push(processed);
}
}
/// Get history prepared for sending to model
pub fn get_history_for_prompt(&mut self) -> Vec<ResponseItem> {
// Normalize: ensure all tool calls have outputs
self.normalize_history();
// Remove internal items (like GhostSnapshots)
let mut history = self.contents();
Self::remove_ghost_snapshots(&mut history);
history
}
}
fn is_api_message(message: &ResponseItem) -> bool {
match message {
// User and assistant messages
ResponseItem::Message { role, .. } => role != "system",
// Tool interactions
ResponseItem::FunctionCall { .. } => true,
ResponseItem::FunctionCallOutput { .. } => true,
ResponseItem::CustomToolCall { .. } => true,
ResponseItem::CustomToolCallOutput { .. } => true,
ResponseItem::LocalShellCall { .. } => true,
// Model reasoning
ResponseItem::Reasoning { .. } => true,
// Web search results
ResponseItem::WebSearchCall { .. } => true,
// Compaction summaries
ResponseItem::CompactionSummary { .. } => true,
// Internal items excluded
ResponseItem::GhostSnapshot { .. } => false,
ResponseItem::Other => false,
}
}
The history must maintain invariants - every tool call needs a corresponding output:
fn normalize_history(&mut self) {
// Ensure all function/tool calls have corresponding outputs
normalize::ensure_call_outputs_present(&mut self.items);
// Remove orphaned outputs without corresponding calls
normalize::remove_orphan_outputs(&mut self.items);
}
Tool outputs can be large. Truncation prevents context overflow:
#[derive(Debug, Clone, Copy)]
pub enum TruncationPolicy {
Bytes(usize), // Truncate by byte count
Tokens(usize), // Truncate by token estimate
}
impl TruncationPolicy {
/// Approximate token budget
pub fn token_budget(&self) -> usize {
match self {
TruncationPolicy::Bytes(bytes) => bytes / 4, // ~4 bytes per token
TruncationPolicy::Tokens(tokens) => *tokens,
}
}
}
Codex preserves both the beginning and end of output:
fn truncate_with_byte_estimate(s: &str, policy: TruncationPolicy) -> String {
let max_bytes = policy.byte_budget();
if s.len() <= max_bytes {
return s.to_string();
}
// Split budget: half for beginning, half for end
let (left_budget, right_budget) = (max_bytes / 2, max_bytes - max_bytes / 2);
// Split string on UTF-8 boundaries
let (removed_chars, left, right) = split_string(s, left_budget, right_budget);
// Create truncation marker
let marker = format!("…{removed_chars} chars truncated…");
// Assemble: beginning + marker + end
format!("{left}{marker}{right}")
}
Example truncated output:
Total output lines: 5000
drwxr-xr-x 5 user user 160 Jan 1 12:00 .
drwxr-xr-x 3 user user 96 Jan 1 11:00 ..
-rw-r--r-- 1 user user 234 Jan 1 12:00 package.json
…4850 chars truncated…
-rw-r--r-- 1 user user 1234 Jan 1 12:00 README.md
-rw-r--r-- 1 user user 567 Jan 1 12:00 tsconfig.json
Tool outputs are truncated when recorded:
fn process_item(&self, item: &ResponseItem, policy: TruncationPolicy) -> ResponseItem {
match item {
ResponseItem::FunctionCallOutput { call_id, output } => {
// Truncate the content
let truncated = truncate_text(&output.content, policy);
// Also truncate any structured content items
let truncated_items = output.content_items.as_ref().map(|items| {
truncate_function_output_items_with_policy(items, policy)
});
ResponseItem::FunctionCallOutput {
call_id: call_id.clone(),
output: FunctionCallOutputPayload {
content: truncated,
content_items: truncated_items,
success: output.success,
},
}
}
// Other items pass through unchanged
_ => item.clone(),
}
}
When the context window fills up, Codex can compact the conversation into a summary:
pub async fn run_compact_task(
sess: Arc<Session>,
turn_context: Arc<TurnContext>,
input: Vec<UserInput>,
) {
// Get current history
let mut history = sess.clone_history().await;
// Iteratively remove oldest items if context is too large
let mut truncated_count = 0;
loop {
let turn_input = history.get_history_for_prompt();
match drain_to_completed(&sess, &turn_context, &prompt).await {
Ok(()) => break,
Err(CodexErr::ContextWindowExceeded) => {
// Remove oldest item and retry
history.remove_first_item();
truncated_count += 1;
}
Err(e) => {
// Handle other errors...
}
}
}
// Extract user messages for preservation
let history_snapshot = sess.clone_history().await.get_history();
let user_messages = collect_user_messages(&history_snapshot);
// Get summary from model
let summary_text = get_last_assistant_message_from_turn(&history_snapshot)
.unwrap_or_default();
// Build compacted history
let initial_context = sess.build_initial_context(&turn_context);
let new_history = build_compacted_history(
initial_context,
&user_messages,
&summary_text
);
// Replace history with compacted version
sess.replace_history(new_history).await;
}
fn build_compacted_history(
mut history: Vec<ResponseItem>,
user_messages: &[String],
summary_text: &str,
) -> Vec<ResponseItem> {
// Budget for preserved user messages
let max_tokens = 20_000;
let mut remaining = max_tokens;
let mut selected_messages = Vec::new();
// Keep recent user messages (working backwards)
for message in user_messages.iter().rev() {
if remaining == 0 { break; }
let tokens = approx_token_count(message);
if tokens <= remaining {
selected_messages.push(message.clone());
remaining -= tokens;
} else {
// Truncate and include partial
let truncated = truncate_text(message, TruncationPolicy::Tokens(remaining));
selected_messages.push(truncated);
break;
}
}
selected_messages.reverse();
// Add preserved user messages to history
for message in &selected_messages {
history.push(ResponseItem::Message {
role: "user".to_string(),
content: vec![ContentItem::InputText { text: message.clone() }],
});
}
// Add summary as the final message
history.push(ResponseItem::Message {
role: "user".to_string(),
content: vec![ContentItem::InputText {
text: format!("{SUMMARY_PREFIX}\n{summary_text}")
}],
});
history
}
Codex tracks token usage to anticipate when compaction is needed:
impl ContextManager {
/// Estimate token count using byte-based heuristics
pub fn estimate_token_count(&self, turn_context: &TurnContext) -> Option<i64> {
let model_family = turn_context.client.get_model_family();
// Base tokens from system prompt
let base_tokens = approx_token_count(&model_family.base_instructions);
// Sum tokens from all items
let items_tokens = self.items.iter().fold(0i64, |acc, item| {
acc + match item {
// Skip internal items
ResponseItem::GhostSnapshot { .. } => 0,
// Estimate reasoning (encrypted content)
ResponseItem::Reasoning { encrypted_content: Some(content), .. } => {
estimate_reasoning_length(content.len())
}
// Serialize and estimate other items
item => {
let serialized = serde_json::to_string(item).unwrap_or_default();
approx_token_count(&serialized) as i64
}
}
});
Some(base_tokens.saturating_add(items_tokens))
}
/// Update with actual token usage from API response
pub fn update_token_info(&mut self, usage: &TokenUsage, context_window: Option<i64>) {
self.token_info = TokenUsageInfo::new_or_append(
&self.token_info,
&Some(usage.clone()),
context_window,
);
}
}
Beyond initial context, agents gather information dynamically through tool calls:
// Model calls: read_file(path: "src/main.rs", offset: 0, limit: 100)
// Returns file content as tool output, which becomes part of context
async fn handle_read_file(params: ReadFileParams) -> Result<ToolOutput> {
let content = tokio::fs::read_to_string(¶ms.path).await?;
// Apply line limits
let lines: Vec<&str> = content.lines().collect();
let start = params.offset.unwrap_or(0);
let end = start + params.limit.unwrap_or(lines.len());
let selected: String = lines[start..end.min(lines.len())]
.iter()
.enumerate()
.map(|(i, line)| format!("{:6}|{}", start + i + 1, line))
.collect::<Vec<_>>()
.join("\n");
Ok(ToolOutput::Function {
content: selected,
success: Some(true),
..Default::default()
})
}
// Model calls: shell(command: ["ls", "-la"])
// Returns command output, which becomes part of context
async fn handle_shell(params: ShellParams) -> Result<ToolOutput> {
let output = Command::new(¶ms.command[0])
.args(¶ms.command[1..])
.current_dir(¶ms.workdir.unwrap_or_default())
.output()
.await?;
let formatted = format_exec_output_for_model(&ExecToolCallOutput {
stdout: String::from_utf8_lossy(&output.stdout).into(),
stderr: String::from_utf8_lossy(&output.stderr).into(),
exit_code: output.status.code().unwrap_or(-1),
duration: elapsed,
});
Ok(ToolOutput::Function {
content: formatted,
success: Some(output.status.success()),
..Default::default()
})
}
// Model calls: grep_files(pattern: "TODO", path: "src/")
// Returns matches, which become part of context
async fn handle_grep(params: GrepParams) -> Result<ToolOutput> {
let matches = ripgrep_search(¶ms.pattern, ¶ms.path)?;
// Format matches with file:line:content
let output = matches.iter()
.map(|m| format!("{}:{}:{}", m.file, m.line, m.content))
.collect::<Vec<_>>()
.join("\n");
Ok(ToolOutput::Function {
content: output,
success: Some(true),
..Default::default()
})
}
Claude Code uses a different approach with plugins that provide contextual capabilities:
Plugins in Claude Code can:
---
name: code-reviewer
description: Reviews code for bugs and security issues
tools: [Glob, Grep, Read] # Limited tool access
model: sonnet
---
You are an expert code reviewer...
[This becomes the sub-agent's system prompt - its context]
Claude Code supports @file syntax in commands to inject file content:
---
description: Analyze the specified file
---
Review the following file for issues:
@$ARGUMENTS
Focus on:
1. Code quality
2. Security concerns
When a user runs /review src/app.ts, the @src/app.ts expands to include the file's content in the context.
Commands can include shell output:
---
description: Review recent changes
---
Files changed since last commit:
!`git diff --name-only HEAD~1`
Review each file...
Priority 1: System prompt (always present)
Priority 2: Project docs (session-level)
Priority 3: Environment context (turn-level)
Priority 4: Conversation history (accumulated)
Priority 5: Tool outputs (on-demand)
Both systems support injecting context mid-conversation as reminders or guidance.
Hooks can inject systemMessage content that gets added to the context whenever they fire:
{
"hookSpecificOutput": {
"permissionDecision": "allow"
},
"systemMessage": "Remember to check for SQL injection when handling user input"
}
Key hook events for injecting reminders:
| Event | When It Fires | Common Use |
|---|---|---|
SessionStart |
Beginning of session | Load project context, set behavioral mode |
PreToolUse |
Before every tool call | Security warnings, validation reminders |
PostToolUse |
After tool completes | Feedback, quality checks |
Stop |
Agent wants to stop | Completion validation |
UserPromptSubmit |
User sends message | Context enrichment |
Example: Security Reminder Hook
Claude Code includes a security reminder system that warns once per session about risky patterns:
# From security_reminder_hook.py
SECURITY_PATTERNS = [
{
"ruleName": "child_process_exec",
"substrings": ["child_process.exec", "exec("],
"reminder": """⚠️ Security Warning: Using child_process.exec()
can lead to command injection vulnerabilities.
Use execFileNoThrow() instead for safety.""",
},
{
"ruleName": "eval_injection",
"substrings": ["eval("],
"reminder": "⚠️ Security Warning: eval() executes arbitrary code...",
},
]
def main():
# Track shown warnings per session to avoid repetition
shown_warnings = load_state(session_id)
# Check if pattern matches
rule_name, reminder = check_patterns(file_path, content)
if rule_name and warning_key not in shown_warnings:
# Show warning once per session
shown_warnings.add(warning_key)
save_state(session_id, shown_warnings)
# Output to stderr - gets added to Claude's context
print(reminder, file=sys.stderr)
sys.exit(2) # Block and show warning
Session-Scoped Reminders:
~/.claude/security_warnings_state_{session_id}.jsonPlugins can inject behavioral instructions at session start:
#!/usr/bin/env bash
# learning-output-style/hooks-handlers/session-start.sh
cat << 'EOF'
{
"hookSpecificOutput": {
"hookEventName": "SessionStart",
"additionalContext": "You are in 'learning' output style mode...
## Learning Mode Philosophy
Instead of implementing everything yourself, identify opportunities
where the user can write 5-10 lines of meaningful code...
## When to Request User Contributions
- Business logic with multiple valid approaches
- Error handling strategies
- Algorithm implementation choices..."
}
}
EOF
This additionalContext becomes part of the model's initial context for every session.
Codex supports a developer role message that supplements the system prompt:
// Configuration
pub struct Config {
/// Developer instructions override injected as a separate message
pub developer_instructions: Option<String>,
}
// Injection into context
pub fn build_initial_context(&self, turn_context: &TurnContext) -> Vec<ResponseItem> {
let mut items = Vec::new();
// Developer instructions come first
if let Some(developer_instructions) = turn_context.developer_instructions.as_deref() {
items.push(DeveloperInstructions::new(developer_instructions).into());
}
// Then user instructions (AGENTS.md)
if let Some(user_instructions) = turn_context.user_instructions.as_deref() {
items.push(UserInstructions {
text: user_instructions,
directory: turn_context.cwd.to_string_lossy().into_owned(),
}.into());
}
// Then environment context
items.push(EnvironmentContext::new(...).into());
items
}
The developer message uses a special "developer" role:
impl From<DeveloperInstructions> for ResponseItem {
fn from(di: DeveloperInstructions) -> Self {
ResponseItem::Message {
role: "developer".to_string(), // Special role
content: vec![ContentItem::InputText { text: di.text }],
}
}
}
When context is compacted, initial context is re-injected:
async fn run_compact_task(...) {
// ... summarization happens ...
// Re-build initial context (developer instructions, AGENTS.md, environment)
let initial_context = sess.build_initial_context(turn_context.as_ref());
// Build new history with initial context + preserved user messages + summary
let new_history = build_compacted_history(
initial_context, // <-- Re-injected!
&user_messages,
&summary_text
);
sess.replace_history(new_history).await;
}
This ensures the model never loses critical instructions even after long conversations.
1. Session-Scope Deduplication Don't spam the same reminder repeatedly:
if warning_key not in shown_warnings:
shown_warnings.add(warning_key)
save_state(session_id, shown_warnings)
show_reminder()
2. Context-Appropriate Triggers Use the right hook event:
SessionStart: One-time behavioral setupPreToolUse: Warnings before risky operations PostToolUse: Feedback on resultsStop: Validation before completing3. Concise Messages Keep reminders brief - they consume context tokens:
# Good: Focused warning
"⚠️ eval() is a security risk. Consider JSON.parse() instead."
# Bad: Essay-length explanation
"⚠️ The eval() function evaluates arbitrary JavaScript code, which
poses significant security risks including... [500 more words]"
4. Structured Output Use the expected JSON format for hooks:
{
"hookSpecificOutput": {...},
"systemMessage": "Brief reminder text"
}
This architecture ensures the model always has the context it needs while managing the inherent limitations of context windows.