Documentation Index
Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The context is an agent’s working memory — the messages (user inputs, assistant responses, tool calls, tool results) that the LLM sees on every reasoning step. As conversations grow, raw context eventually exceeds the model’s window. AgentScope keeps an agent runnable indefinitely through three mechanisms:- Context compression — summarizes older messages once token usage approaches the model’s limit.
- Tool result truncation — caps oversized tool outputs before they enter the context.
- Context offloading — persists removed content to external storage so the agent can retrieve it later.
Model API Input
System Prompt
Base system prompt
Skill instructions (from Toolkit)
on_system_prompt middleware transforms
Summary (compressed history, if any)
Context (recent uncompressed messages)
- System prompt — starts from the
system_promptpassed at agent creation, then appends skill instructions (each skill’s name and description, sourced from the toolkit), then runs everyon_system_promptmiddleware hook in order. - Summary — the compressed digest of older messages, present only after a compression has occurred.
- Context — the recent uncompressed messages (user inputs, assistant responses, tool calls, tool results).
Compact Context
When the context window fills up, AgentScope keeps it in shape with two automatic mechanisms governed byContextConfig: context compression (summarize older messages) and tool result truncation (cap oversized tool outputs). Both run transparently — the agent continues working without interruption.
Configure ContextConfig
ContextConfig is passed to the agent at construction time:
| Parameter | Type | Description |
|---|---|---|
trigger_ratio | float | Compression activates when token usage exceeds this ratio of the model’s context size (capped at 0.9) |
reserve_ratio | float | Proportion of context tokens kept as recent messages after compression |
tool_result_limit | int | Maximum tokens per tool result; outputs exceeding this are truncated |
compression_prompt | str | The prompt that guides the model to generate the summary |
summary_template | str | String template for formatting the summary into the context |
summary_schema | dict | JSON Schema constraining the model’s structured summary output |
Compress Context
Compression runs automatically before each reasoning step. The flow:Check threshold
If total tokens exceed
trigger_ratio × context_size, compression activates. Otherwise the agent proceeds with the model call as usual.Split messages
Older messages are marked for compression; recent messages within
reserve_ratio × context_size are kept. Tool call / result pairs are kept intact across the split.Generate summary
The model produces a structured summary from the older messages, with five fields:
task_overview, current_state, important_discoveries, next_steps, context_to_preserve.The remaining 10% between
trigger_ratio (max 0.9) and the full context size is reserved for the compression model call itself — the model needs room to generate the summary.compress_context() method. Without arguments, it uses the agent’s stored context_config; pass a one-off ContextConfig to override:
trigger_ratio × context_size, so it is safe to call between turns or at any custom checkpoint.
Truncate Tool Results
After each tool call, the agent compares the result’s token count againsttool_result_limit. If the limit is exceeded, the result is split into a reserved portion (kept in context) and an offloaded portion (handed to the offloader if one is attached — see Offload Context).
A truncation marker is appended to the reserved portion so the agent knows the output was clipped:
Offload Context
Context offloading writes content the agent has dropped — compressed messages, truncated tool outputs — to external storage, so the agent can read it back later via its file tools (Read, Grep, Glob) when it needs a detail that was compressed away.Use the Offloader Protocol
Offloading is wired through theOffloader protocol — a structural contract with two methods:
| Method | Description |
|---|---|
offload_context(session_id, msgs) | Persist compressed messages; returns a reference (e.g. a file path) to the persisted content |
offload_tool_result(session_id, tool_result) | Persist a truncated tool result; returns a reference to the persisted content |
offloader argument. AgentScope’s workspace module ships ready-made implementations:
offloader, compressed messages and truncated tool results are simply dropped after they leave the context window.
Use LocalWorkspace
LocalWorkspace writes offloaded content under workdir, isolating each agent run by session_id:
{workdir}
data
{sha256}.png
sessions
{session_id}
context.jsonl
tool_result-{tool_id}.txt
skills
sessions/{session_id}/— one directory per agent session, so concurrent agents don’t collide. Compressed messages append tocontext.jsonl; each truncated tool result becomes its owntool_result-{tool_id}.txt.data/— multimodal files (images, audio) referenced by offloaded messages, deduplicated by SHA-256 content hash.skills/— unrelated to offloading; the workspace also serves as the agent’s skill directory.
Create Custom Offloader
For backends other than the local filesystem (databases, cloud blobs, vector stores), implement theOffloader protocol — no inheritance required, since it is a structural protocol:
Agent(offloader=...) like any built-in workspace.
Further Reading
Workspace
Built-in offloader implementations and the agent’s working environment
Agent
The ReAct loop and how context flows through reasoning steps
Middleware
Intercept model calls and system prompt composition with middleware hooks
Tool
Tools that produce results subject to compression