Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Agent middleware is the mechanism for injecting custom logic — logging, tracing, input rewriting, access control — into key points of the agent execution pipeline, without modifying the agent or model code. AgentScope exposes 5 hook positions, covering the full path from the outer reply process down to the raw model API call:
PositionTypeDescription
on_replyOnionWraps a complete reply, covering all ReAct rounds, tool executions, and the final output
on_reasoningOnionWraps a single ReAct round’s reasoning step (input assembly → model call → stream decoding)
on_actingOnionWraps a single tool call execution
on_model_callOnionWraps the underlying ChatModel API call — the closest to the model
on_system_promptTransformerFires every time the system prompt is assembled; multiple middlewares chain in sequence, each transforming the previous one’s output
The two types differ as follows:
  • Onion — middleware wraps the next handler, allowing logic before/after next_handler() and observation of the intermediate event stream.
  • Transformer — middlewares form a pipeline; the previous one’s output feeds into the next one. There is no “inner layer” concept.
The diagram below shows how these hooks nest within the agent lifecycle. on_system_prompt is embedded inside on_reasoning because it fires when the reasoning step assembles the system prompt:
on_reply
ReAct loop (per round)
on_reasoning
on_system_prompt (system prompt assembly)
on_model_call (model API call)
on_acting (once per tool call)
on_acting currently wraps only tool execution inside the agent runtime; tools dispatched outside the agent via external execution are not tracked by on_acting.

Equip Middleware

AgentScope packages a set of hooks into a class — a single middleware class can implement any subset of the 5 hook positions at the same time. Pass instances to Agent(middlewares=[...]) to equip them:
from agentscope import Agent
from agentscope.middleware import TracingMiddleware

agent = Agent(
    name="assistant",
    system_prompt="You are a helpful assistant.",
    model=model,
    toolkit=toolkit,
    middlewares=[TracingMiddleware()],
)
At construction time the agent scans each middleware instance, checks which hooks it actually implements, and routes it into the matching position-specific execution lists. Unimplemented positions are skipped automatically with no call overhead.

Built-in Middleware

TracingMiddleware

TracingMiddleware wires the full agent lifecycle to OpenTelemetry tracing. It instruments on_reply, on_model_call, and on_acting, producing hierarchical spans. Before using it, register a TracerProvider and an OTLP exporter in the process:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces")),
)
trace.set_tracer_provider(provider)
Then attach TracingMiddleware to the agent:
from agentscope import Agent
from agentscope.middleware import TracingMiddleware

agent = Agent(
    name="assistant",
    system_prompt="You are a helpful assistant.",
    model=model,
    toolkit=toolkit,
    middlewares=[TracingMiddleware()],
)
Each reply produces a nested span tree. The key attributes captured at each level are:
From on_reply:
  • Agent name, session ID, reply ID
  • Input messages and the final output message
  • HITL pending tool calls
  • External execution pending tool calls
When no TracerProvider is configured, every hook short-circuits directly to next_handler() — no spans are created, no attributes are computed — making the overhead negligible.
When the agent receives an ExternalExecutionResultEvent (a tool executed outside the agent), TracingMiddleware synthesizes a compensating span for each external execution result, preserving full observability for tools run by external systems.

Custom Middleware

Subclass MiddlewareBase and implement only the hooks you need — leave the rest alone. The example below covers 4 positions in a single middleware. Each onion hook receives an input_kwargs dict carrying the fields that flow into the wrapped layer; forward it with next_handler(**input_kwargs), or pass keyword arguments to override specific fields:
from typing import AsyncGenerator, Awaitable, Callable

from agentscope import Agent
from agentscope.event import AgentEvent
from agentscope.message import Msg
from agentscope.middleware import MiddlewareBase
from agentscope.model import ChatResponse


class FullObservabilityMiddleware(MiddlewareBase):
    """Observe reply, reasoning, model_call, and system_prompt at once."""

    async def on_reply(
        self,
        agent: Agent,
        # {"inputs": Msg | list[Msg] | UserConfirmResultEvent | ExternalExecutionResultEvent | None}
        input_kwargs: dict,
        next_handler: Callable[..., AsyncGenerator[AgentEvent | Msg, None]],
    ) -> AsyncGenerator[AgentEvent | Msg, None]:
        print(f"[reply] start for {agent.name}")
        async for item in next_handler(**input_kwargs):
            yield item
        print(f"[reply] end for {agent.name}")

    async def on_reasoning(
        self,
        agent: Agent,
        # {"tool_choice": ToolChoice | None}
        input_kwargs: dict,
        next_handler: Callable[..., AsyncGenerator[AgentEvent, None]],
    ) -> AsyncGenerator[AgentEvent, None]:
        print("[reasoning] start")
        async for event in next_handler(**input_kwargs):
            yield event
        print("[reasoning] end")

    async def on_model_call(
        self,
        agent: Agent,
        # {"messages": list[Msg], "tools": list[dict], "tool_choice": ToolChoice | None, "current_model": ChatModelBase}
        input_kwargs: dict,
        next_handler: Callable[
            ..., Awaitable[ChatResponse | AsyncGenerator[ChatResponse, None]]
        ],
    ) -> ChatResponse | AsyncGenerator[ChatResponse, None]:
        print(f"[model_call] {input_kwargs['current_model'].model}")
        result = await next_handler(**input_kwargs)
        print("[model_call] done")
        return result

    async def on_system_prompt(
        self,
        agent: Agent,
        current_prompt: str,
    ) -> str:
        print(f"[system_prompt] length={len(current_prompt)}")
        return current_prompt

Execution Order

Onion hooks (on_reply, on_reasoning, on_acting, on_model_call) — the first middleware in the list is the outermost layer:
middlewares = [mw1, mw2]
# Call order:
# mw1 pre → mw2 pre → inner logic → mw2 post → mw1 post
For streaming / event-yielding hooks, the inner middleware sees each yielded event first:
mw1_pre → mw2_pre → mw2_event → mw1_event → ... → mw2_post → mw1_post
Transformer hooks (on_system_prompt) — middlewares chain left to right:
middlewares = [mw1, mw2]
# original_prompt → mw1.on_system_prompt() → mw2.on_system_prompt() → final
The overall execution order of all hooks within a single reply follows the agent lifecycle:
on_reply
  └── per ReAct round:
        ├── compress_context() → on_system_prompt (token counting)
        ├── on_reasoning
        │     ├── _prepare_model_input() → on_system_prompt
        │     └── on_model_call
        └── on_acting (once per tool call in this round)

Practical Examples

Timing middleware

The middleware below records the elapsed time of every model call:
import time
from agentscope.middleware import MiddlewareBase

class TimingMiddleware(MiddlewareBase):
    async def on_model_call(self, agent, input_kwargs, next_handler):
        model_name = input_kwargs["current_model"].model
        start = time.time()

        result = await next_handler()

        elapsed = time.time() - start
        print(f"[timing] {agent.name}{model_name}: {elapsed:.2f}s")
        return result

Rate-limiting middleware

The middleware below enforces a minimum interval between two model calls:
import asyncio
import time
from agentscope.middleware import MiddlewareBase

class RateLimitMiddleware(MiddlewareBase):
    def __init__(self, min_interval: float = 1.0):
        self._last_call = 0.0
        self._min_interval = min_interval

    async def on_model_call(self, agent, input_kwargs, next_handler):
        now = time.time()
        wait = self._min_interval - (now - self._last_call)
        if wait > 0:
            await asyncio.sleep(wait)
        self._last_call = time.time()
        return await next_handler()

Dynamic system prompt middleware

The middleware below injects real-time context into the system prompt:
from datetime import datetime
from agentscope.middleware import MiddlewareBase

class DynamicContextMiddleware(MiddlewareBase):
    def __init__(self, context_fn):
        self._context_fn = context_fn

    async def on_system_prompt(self, agent, current_prompt):
        context = self._context_fn()
        return f"{current_prompt}\n\n## Current Context\n{context}"

agent = Agent(
    ...
    middlewares=[
        DynamicContextMiddleware(
            lambda: f"Time: {datetime.now().isoformat()}"
        ),
    ],
)

Model fallback middleware

The middleware below switches to a fallback model when the primary one fails:
from agentscope.middleware import MiddlewareBase

class ModelFallbackMiddleware(MiddlewareBase):
    def __init__(self, fallback_model):
        self._fallback = fallback_model

    async def on_model_call(self, agent, input_kwargs, next_handler):
        try:
            return await next_handler()
        except Exception as e:
            print(f"Primary model failed: {e}, switching to fallback")
            return await next_handler(
                current_model=self._fallback,
            )