Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The context is an agent’s working memory — the messages (user inputs, assistant responses, tool calls, tool results) that the LLM sees on every reasoning step. As conversations grow, raw context eventually exceeds the model’s window. AgentScope keeps an agent runnable indefinitely through three mechanisms:
  • Context compression — summarizes older messages once token usage approaches the model’s limit.
  • Tool result truncation — caps oversized tool outputs before they enter the context.
  • Context offloading — persists removed content to external storage so the agent can retrieve it later.
Before each model call, the agent assembles a single API input from three layers — the structure below shows what flows into that call:
Model API Input
System Prompt
Base system prompt
Skill instructions (from Toolkit)
on_system_prompt middleware transforms
Summary (compressed history, if any)
Context (recent uncompressed messages)
How each layer is built:
  1. System prompt — starts from the system_prompt passed at agent creation, then appends skill instructions (each skill’s name and description, sourced from the toolkit), then runs every on_system_prompt middleware hook in order.
  2. Summary — the compressed digest of older messages, present only after a compression has occurred.
  3. Context — the recent uncompressed messages (user inputs, assistant responses, tool calls, tool results).
Use the on_system_prompt middleware hook to inject dynamic context — workspace instructions, time-sensitive facts, environment details — without rewriting the base prompt.

Compact Context

When the context window fills up, AgentScope keeps it in shape with two automatic mechanisms governed by ContextConfig: context compression (summarize older messages) and tool result truncation (cap oversized tool outputs). Both run transparently — the agent continues working without interruption.

Configure ContextConfig

ContextConfig is passed to the agent at construction time:
from agentscope import Agent
from agentscope.agent import ContextConfig

agent = Agent(
    name="my_agent",
    system_prompt="...",
    model=model,
    toolkit=toolkit,
    context_config=ContextConfig(
        trigger_ratio=0.8,
        reserve_ratio=0.1,
        tool_result_limit=3000,
    ),
)
Available fields:
ParameterTypeDescription
trigger_ratiofloatCompression activates when token usage exceeds this ratio of the model’s context size (capped at 0.9)
reserve_ratiofloatProportion of context tokens kept as recent messages after compression
tool_result_limitintMaximum tokens per tool result; outputs exceeding this are truncated
compression_promptstrThe prompt that guides the model to generate the summary
summary_templatestrString template for formatting the summary into the context
summary_schemadictJSON Schema constraining the model’s structured summary output

Compress Context

Compression runs automatically before each reasoning step. The flow:
1

Count tokens

The agent totals the tokens of the system prompt, summary, context, and tool schemas.
2

Check threshold

If total tokens exceed trigger_ratio × context_size, compression activates. Otherwise the agent proceeds with the model call as usual.
3

Split messages

Older messages are marked for compression; recent messages within reserve_ratio × context_size are kept. Tool call / result pairs are kept intact across the split.
4

Generate summary

The model produces a structured summary from the older messages, with five fields: task_overview, current_state, important_discoveries, next_steps, context_to_preserve.
5

Update state

The summary replaces the compressed messages; the reserved messages become the new context. The agent then continues its reasoning step.
The remaining 10% between trigger_ratio (max 0.9) and the full context size is reserved for the compression model call itself — the model needs room to generate the summary.
Compression can also be triggered manually by calling the agent’s compress_context() method. Without arguments, it uses the agent’s stored context_config; pass a one-off ContextConfig to override:
# Force-check using the agent's default config
await agent.compress_context()

# Or override the config for this single call (e.g. compress more aggressively)
from agentscope.agent import ContextConfig

await agent.compress_context(
    context_config=ContextConfig(trigger_ratio=0.5, reserve_ratio=0.1),
)
The method is a no-op when token usage is below trigger_ratio × context_size, so it is safe to call between turns or at any custom checkpoint.

Truncate Tool Results

After each tool call, the agent compares the result’s token count against tool_result_limit. If the limit is exceeded, the result is split into a reserved portion (kept in context) and an offloaded portion (handed to the offloader if one is attached — see Offload Context). A truncation marker is appended to the reserved portion so the agent knows the output was clipped:
<<<TRUNCATED>>>
<system-reminder>The remaining content has been omitted for limited context.</system-reminder>
When an offloader is attached, the marker also points the agent to the persisted full output:
<<<TRUNCATED>>>
<system-reminder>The remaining content has been omitted for limited context. You can refer to the file in '/path/to/tool_result-<id>.txt' for the truncated content if needed.</system-reminder>
Setting tool_result_limit too low may starve the agent of critical tool output. Setting it too high risks one result filling the entire context.

Offload Context

Context offloading writes content the agent has dropped — compressed messages, truncated tool outputs — to external storage, so the agent can read it back later via its file tools (Read, Grep, Glob) when it needs a detail that was compressed away.

Use the Offloader Protocol

Offloading is wired through the Offloader protocol — a structural contract with two methods:
MethodDescription
offload_context(session_id, msgs)Persist compressed messages; returns a reference (e.g. a file path) to the persisted content
offload_tool_result(session_id, tool_result)Persist a truncated tool result; returns a reference to the persisted content
Pass any object satisfying this protocol to the agent’s offloader argument. AgentScope’s workspace module ships ready-made implementations:
from agentscope import Agent
from agentscope.workspace import LocalWorkspace

workspace = LocalWorkspace(workdir="/tmp/agent_workspace")
await workspace.initialize()

agent = Agent(
    name="my_agent",
    system_prompt="...",
    model=model,
    toolkit=toolkit,
    offloader=workspace,
)
Without an offloader, compressed messages and truncated tool results are simply dropped after they leave the context window.

Use LocalWorkspace

LocalWorkspace writes offloaded content under workdir, isolating each agent run by session_id:
{workdir}
data
{sha256}.png
sessions
{session_id}
context.jsonl
tool_result-{tool_id}.txt
How content is laid out:
  • sessions/{session_id}/ — one directory per agent session, so concurrent agents don’t collide. Compressed messages append to context.jsonl; each truncated tool result becomes its own tool_result-{tool_id}.txt.
  • data/ — multimodal files (images, audio) referenced by offloaded messages, deduplicated by SHA-256 content hash.
  • skills/ — unrelated to offloading; the workspace also serves as the agent’s skill directory.

Create Custom Offloader

For backends other than the local filesystem (databases, cloud blobs, vector stores), implement the Offloader protocol — no inheritance required, since it is a structural protocol:
from typing import Any
from agentscope.message import Msg, ToolResultBlock

class S3Offloader:
    def __init__(self, bucket: str, prefix: str) -> None:
        self.bucket = bucket
        self.prefix = prefix

    async def offload_context(
        self,
        session_id: str,
        msgs: list[Msg],
        **kwargs: Any,
    ) -> str:
        key = f"{self.prefix}/sessions/{session_id}/context.jsonl"
        content = "\n".join(m.model_dump_json() for m in msgs)
        await self._upload(self.bucket, key, content)
        return f"s3://{self.bucket}/{key}"

    async def offload_tool_result(
        self,
        session_id: str,
        tool_result: ToolResultBlock,
        **kwargs: Any,
    ) -> str:
        key = f"{self.prefix}/sessions/{session_id}/tool_result-{tool_result.id}.txt"
        # Extract text content from the tool result blocks and upload.
        ...
        return f"s3://{self.bucket}/{key}"
Pass the instance into Agent(offloader=...) like any built-in workspace.

Further Reading

Workspace

Built-in offloader implementations and the agent’s working environment

Agent

The ReAct loop and how context flows through reasoning steps

Middleware

Intercept model calls and system prompt composition with middleware hooks

Tool

Tools that produce results subject to compression