> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Context

> Manage agent context window for stable, long-running execution

## Overview

The context is an agent's working memory — the messages (user inputs, assistant responses, tool calls, tool results) that the LLM sees on every reasoning step. As conversations grow, raw context eventually exceeds the model's window. AgentScope keeps an agent runnable indefinitely through three mechanisms:

* **Context compression** — summarizes older messages once token usage approaches the model's limit.
* **Tool result truncation** — caps oversized tool outputs before they enter the context.
* **Context offloading** — persists removed content to external storage so the agent can retrieve it later.

Before each model call, the agent assembles a single API input from three layers — the structure below shows what flows into that call:

<Tree>
  <Tree.Folder name="Model API Input" defaultOpen>
    <Tree.Folder name="System Prompt" defaultOpen>
      <Tree.File name="Base system prompt" />

      <Tree.File name="Skill instructions (from Toolkit)" />

      <Tree.File name="on_system_prompt middleware transforms" />
    </Tree.Folder>

    <Tree.File name="Summary (compressed history, if any)" />

    <Tree.File name="Context (recent uncompressed messages)" />
  </Tree.Folder>
</Tree>

How each layer is built:

1. **System prompt** — starts from the `system_prompt` passed at agent creation, then appends skill instructions (each skill's name and description, sourced from the toolkit), then runs every `on_system_prompt` [middleware](/versions/2.0.3/en/building-blocks/middleware) hook in order.
2. **Summary** — the compressed digest of older messages, present only after a compression has occurred.
3. **Context** — the recent uncompressed messages (user inputs, assistant responses, tool calls, tool results).

<Tip>
  Use the `on_system_prompt` middleware hook to inject dynamic context — workspace instructions, time-sensitive facts, environment details — without rewriting the base prompt.
</Tip>

## Compact Context

When the context window fills up, AgentScope keeps it in shape with two automatic mechanisms governed by `ContextConfig`: **context compression** (summarize older messages) and **tool result truncation** (cap oversized tool outputs). Both run transparently — the agent continues working without interruption.

### Configure ContextConfig

`ContextConfig` is passed to the agent at construction time:

```python theme={null}
from agentscope.agent import Agent
from agentscope.agent import ContextConfig

agent = Agent(
    name="my_agent",
    system_prompt="...",
    model=model,
    toolkit=toolkit,
    context_config=ContextConfig(
        trigger_ratio=0.8,
        reserve_ratio=0.1,
        tool_result_limit=3000,
    ),
)
```

Available fields:

| Parameter            | Type    | Description                                                                                             |
| -------------------- | ------- | ------------------------------------------------------------------------------------------------------- |
| `trigger_ratio`      | `float` | Compression activates when token usage exceeds this ratio of the model's context size (capped at `0.9`) |
| `reserve_ratio`      | `float` | Proportion of context tokens kept as recent messages after compression                                  |
| `tool_result_limit`  | `int`   | Maximum tokens per tool result; outputs exceeding this are truncated                                    |
| `compression_prompt` | `str`   | The prompt that guides the model to generate the summary                                                |
| `summary_template`   | `str`   | String template for formatting the summary into the context                                             |
| `summary_schema`     | `dict`  | JSON Schema constraining the model's structured summary output                                          |

### Compress Context

Compression runs automatically before each reasoning step. The flow:

<Steps>
  <Step title="Count tokens">
    The agent totals the tokens of the system prompt, summary, context, and tool schemas.
  </Step>

  <Step title="Check threshold">
    If total tokens exceed `trigger_ratio × context_size`, compression activates. Otherwise the agent proceeds with the model call as usual.
  </Step>

  <Step title="Split messages">
    Older messages are marked for compression; recent messages within `reserve_ratio × context_size` are kept. Tool call / result pairs are kept intact across the split.
  </Step>

  <Step title="Generate summary">
    The model produces a structured summary from the older messages, with five fields: `task_overview`, `current_state`, `important_discoveries`, `next_steps`, `context_to_preserve`.
  </Step>

  <Step title="Update state">
    The summary replaces the compressed messages; the reserved messages become the new context. The agent then continues its reasoning step.
  </Step>
</Steps>

<Note>
  The remaining 10% between `trigger_ratio` (max `0.9`) and the full context size is reserved for the compression model call itself — the model needs room to generate the summary.
</Note>

Compression can also be triggered manually by calling the agent's `compress_context()` method. Without arguments, it uses the agent's stored `context_config`; pass a one-off `ContextConfig` to override:

```python theme={null}
# Force-check using the agent's default config
await agent.compress_context()

# Or override the config for this single call (e.g. compress more aggressively)
from agentscope.agent import ContextConfig

await agent.compress_context(
    context_config=ContextConfig(trigger_ratio=0.5, reserve_ratio=0.1),
)
```

The method is a no-op when token usage is below `trigger_ratio × context_size`, so it is safe to call between turns or at any custom checkpoint.

### Truncate Tool Results

After each tool call, the agent compares the result's token count against `tool_result_limit`. If the limit is exceeded, the result is split into a reserved portion (kept in context) and an offloaded portion (handed to the offloader if one is attached — see [Offload Context](#offload-context)).

A truncation marker is appended to the reserved portion so the agent knows the output was clipped:

```
<<<TRUNCATED>>>
<system-reminder>The remaining content has been omitted for limited context.</system-reminder>
```

When an offloader is attached, the marker also points the agent to the persisted full output:

```
<<<TRUNCATED>>>
<system-reminder>The remaining content has been omitted for limited context. You can refer to the file in '/path/to/tool_result-<id>.txt' for the truncated content if needed.</system-reminder>
```

<Warning>
  Setting `tool_result_limit` too low may starve the agent of critical tool output. Setting it too high risks one result filling the entire context.
</Warning>

## Offload Context

Context offloading writes content the agent has dropped — compressed messages, truncated tool outputs — to external storage, so the agent can read it back later via its file tools (Read, Grep, Glob) when it needs a detail that was compressed away.

### Use the Offloader Protocol

Offloading is wired through the `Offloader` protocol — a structural contract with two methods:

| Method                                         | Description                                                                                  |
| ---------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `offload_context(session_id, msgs)`            | Persist compressed messages; returns a reference (e.g. a file path) to the persisted content |
| `offload_tool_result(session_id, tool_result)` | Persist a truncated tool result; returns a reference to the persisted content                |

Pass any object satisfying this protocol to the agent's `offloader` argument. AgentScope's [`workspace`](/versions/2.0.3/en/building-blocks/workspace) module ships ready-made implementations:

```python theme={null}
from agentscope.agent import Agent
from agentscope.workspace import LocalWorkspace

workspace = LocalWorkspace(workdir="/tmp/agent_workspace")
await workspace.initialize()

agent = Agent(
    name="my_agent",
    system_prompt="...",
    model=model,
    toolkit=toolkit,
    offloader=workspace,
)
```

Without an `offloader`, compressed messages and truncated tool results are simply dropped after they leave the context window.

### Use LocalWorkspace

`LocalWorkspace` writes offloaded content under `workdir`, isolating each agent run by `session_id`:

<Tree>
  <Tree.Folder name="{workdir}" defaultOpen>
    <Tree.Folder name="data" defaultOpen>
      <Tree.File name="{sha256}.png" />
    </Tree.Folder>

    <Tree.Folder name="sessions" defaultOpen>
      <Tree.Folder name="{session_id}" defaultOpen>
        <Tree.File name="context.jsonl" />

        <Tree.File name="tool_result-{tool_id}.txt" />
      </Tree.Folder>
    </Tree.Folder>

    <Tree.Folder name="skills">
      <Tree.File name="..." />
    </Tree.Folder>
  </Tree.Folder>
</Tree>

How content is laid out:

* **`sessions/{session_id}/`** — one directory per agent session, so concurrent agents don't collide. Compressed messages append to `context.jsonl`; each truncated tool result becomes its own `tool_result-{tool_id}.txt`.
* **`data/`** — multimodal files (images, audio) referenced by offloaded messages, deduplicated by SHA-256 content hash.
* **`skills/`** — unrelated to offloading; the workspace also serves as the agent's skill directory.

### Create Custom Offloader

For backends other than the local filesystem (databases, cloud blobs, vector stores), implement the `Offloader` protocol — no inheritance required, since it is a structural protocol:

```python theme={null}
from typing import Any
from agentscope.message import Msg, ToolResultBlock

class S3Offloader:
    def __init__(self, bucket: str, prefix: str) -> None:
        self.bucket = bucket
        self.prefix = prefix

    async def offload_context(
        self,
        session_id: str,
        msgs: list[Msg],
        **kwargs: Any,
    ) -> str:
        key = f"{self.prefix}/sessions/{session_id}/context.jsonl"
        content = "\n".join(m.model_dump_json() for m in msgs)
        await self._upload(self.bucket, key, content)
        return f"s3://{self.bucket}/{key}"

    async def offload_tool_result(
        self,
        session_id: str,
        tool_result: ToolResultBlock,
        **kwargs: Any,
    ) -> str:
        key = f"{self.prefix}/sessions/{session_id}/tool_result-{tool_result.id}.txt"
        # Extract text content from the tool result blocks and upload.
        ...
        return f"s3://{self.bucket}/{key}"
```

Pass the instance into `Agent(offloader=...)` like any built-in workspace.

## Further Reading

<CardGroup cols={2}>
  <Card title="Workspace" icon="folder-tree" href="/versions/2.0.3/en/building-blocks/workspace">
    Built-in offloader implementations and the agent's working environment
  </Card>

  <Card title="Agent" icon="robot" href="/versions/2.0.3/en/building-blocks/agent">
    The ReAct loop and how context flows through reasoning steps
  </Card>

  <Card title="Middleware" icon="layer-group" href="/versions/2.0.3/en/building-blocks/middleware">
    Intercept model calls and system prompt composition with middleware hooks
  </Card>

  <Card title="Tool" icon="wrench" href="/versions/2.0.3/en/building-blocks/tool">
    Tools that produce results subject to compression
  </Card>
</CardGroup>
