Model - AgentScope

Overview

The model layer is organized as a two-tier hierarchy: a Credential at the top, and the model families a provider exposes beneath it — Chat Model, TTS, Embedding, and Realtime Model.

Credential

ChatModelBase

OpenAIChatModel

AnthropicChatModel

DashScopeChatModel

...

TTSModelBase (coming soon)

EmbeddingModelBase (coming soon)

RealtimeModelBase (coming soon)

A Credential carries the API authentication fields a provider requires (api_key, base_url, …). From a credential, you can retrieve the list of available models for each model family that provider supports. This layering mirrors the natural frontend flow — register a credential first, then pick a model from under it — letting the UI authenticate once and surface every model family the provider supports.

Chat Model

A Chat Model is the LLM that drives an agent’s conversation and tool calls, accepting and producing multimodal content beyond plain text. AgentScope currently ships the following chat model classes:

Provider	Model Class	Highlights
OpenAI	`OpenAIChatModel`	Chat Completions API, compatible with vLLM and OpenAI-compatible endpoints
OpenAI (Responses)	`OpenAIResponseModel`	Responses API with native reasoning support (o3, o4-mini)
Anthropic	`AnthropicChatModel`	Claude models with extended thinking and prompt caching
DashScope	`DashScopeChatModel`	Qwen models, multimodal (vision/audio/video), reasoning
DeepSeek	`DeepSeekChatModel`	OpenAI-compatible with prompt cache hit tokens
Gemini	`GeminiChatModel`	Google Gemini models with multimodal support
Moonshot	`MoonshotChatModel`	Kimi models (OpenAI-compatible)
xAI	`XAIChatModel`	Grok models with native reasoning effort
Ollama	`OllamaChatModel`	Local LLM hosting, credential is optional

Create Chat Model

Every chat model takes a credential, a model name, and an optional provider-specific Parameters object. The three tabs below show typical setups for streaming, tool calling, and reasoning:

import os
from agentscope.model import DashScopeChatModel
from agentscope.credential import DashScopeCredential

model = DashScopeChatModel(
    credential=DashScopeCredential(api_key=os.environ["DASHSCOPE_API_KEY"]),
    model="qwen-plus",
    stream=True,
)

Common constructor arguments shared by every chat model:

Argument	Type	Description
`credential`	`CredentialBase`	Provider-specific credential
`model`	`str`	Model identifier (e.g. `"qwen-plus"`)
`parameters`	`Parameters \| None`	Provider-specific parameters such as `temperature`, `thinking_enable`, `parallel_tool_calls`
`stream`	`bool`	Whether to stream output
`max_retries`	`int`	Maximum API retries on failure
`context_size`	`int`	Context window used for context compression
`formatter`	`FormatterBase \| None`	Override message formatter

Call Chat Model

Invoke the model by calling it with a list of Msg objects, plus optional tools and tool_choice:

async def __call__(
    self,
    messages: list[Msg],
    tools: list[dict] | None = None,
    tool_choice: ToolChoice | None = None,
    **kwargs: Any,
) -> ChatResponse | AsyncGenerator[ChatResponse, None]:

The return type follows the model’s stream setting:

stream=False — awaits a single ChatResponse carrying the full output.
stream=True — awaits an AsyncGenerator[ChatResponse, None]. Intermediate chunks (is_last=False) carry only the delta generated in that step. So that callers don’t have to accumulate deltas themselves, AgentScope appends one final chunk with is_last=True that carries the full accumulated content.

import asyncio
import os
from agentscope.model import DashScopeChatModel
from agentscope.credential import DashScopeCredential
from agentscope.message import UserMsg

async def main():
    model = DashScopeChatModel(
        credential=DashScopeCredential(api_key=os.environ["DASHSCOPE_API_KEY"]),
        model="qwen-plus",
        stream=True,
    )
    msgs = [UserMsg(name="user", content="Count from 1 to 5.")]

    async for chunk in await model(msgs):
        if chunk.is_last:
            print("Final:", chunk.content)   # full accumulated content
        else:
            print("Delta:", chunk.content)   # delta only

asyncio.run(main())

A representative streaming trace, illustrating the delta-then-accumulated pattern:

Delta: [TextBlock(text='1')]
Delta: [TextBlock(text=', 2,')]
Delta: [TextBlock(text=' 3, ')]
Delta: [TextBlock(text='4, 5')]
Final: [TextBlock(text='1, 2, 3, 4, 5')]

Each ChatResponse carries content blocks (TextBlock, ThinkingBlock, ToolCallBlock, DataBlock), an is_last flag, and a ChatUsage recording token counts and elapsed time.

Generate Structured Output

When you need a JSON object that conforms to a Pydantic model or JSON schema, call generate_structured_output instead of __call__. It returns a StructuredResponse whose content is a validated dict matching the schema:

import asyncio
import os
from pydantic import BaseModel
from agentscope.model import DashScopeChatModel
from agentscope.credential import DashScopeCredential
from agentscope.message import UserMsg

class WeatherInfo(BaseModel):
    city: str
    temperature: float
    unit: str

async def main():
    model = DashScopeChatModel(
        credential=DashScopeCredential(api_key=os.environ["DASHSCOPE_API_KEY"]),
        model="qwen-plus",
        stream=False,
    )
    response = await model.generate_structured_output(
        messages=[UserMsg(name="user", content="What's the weather in Shanghai?")],
        structured_model=WeatherInfo,
    )
    print(response.content)  # validated dict matching WeatherInfo

asyncio.run(main())

generate_structured_output synthesizes a forced tool call from the schema, then validates and repairs the model’s response.

Formatter

A formatter translates AgentScope’s Msg objects into the list[dict] payload that each provider’s API expects. It is configured via the optional formatter argument on the chat model constructor. Every provider ships two built-in variants:

Variant	Use Case
ChatFormatter (default)	Standard single-agent dialog. Each `Msg` maps 1:1 to an API message, preserving native roles (`user`, `assistant`, `system`).
MultiAgentFormatter	Multi-agent scenarios such as debate or moderation. Consecutive agent messages are grouped and wrapped in `<history>` tags with the sender’s name, while tool call / result sequences keep their native API format.

Switch to multi-agent mode by passing the MultiAgent variant — no agent code changes are required:

import os
from agentscope.model import OpenAIChatModel
from agentscope.credential import OpenAICredential
from agentscope.formatter import OpenAIMultiAgentFormatter

model = OpenAIChatModel(
    credential=OpenAICredential(api_key=os.environ["OPENAI_API_KEY"]),
    model="gpt-4.1",
    formatter=OpenAIMultiAgentFormatter(),
)

For non-standard payload shapes (e.g. a provider whose API doesn’t follow the OpenAI or Anthropic conventions), subclass FormatterBase and pass an instance through the same formatter argument.

Custom Provider

You can extend AgentScope with your own model provider by implementing a credential and a chat model, then registering the credential.

Step 1: Define the Credential

Subclass CredentialBase with a unique type discriminator and implement get_chat_model_class():

from typing import Literal, Type, TYPE_CHECKING
from pydantic import ConfigDict, Field, SecretStr
from agentscope.credential import CredentialBase

if TYPE_CHECKING:
    from agentscope.model import ChatModelBase

class MyProviderCredential(CredentialBase):
    model_config = ConfigDict(title="My Provider API")
    type: Literal["my_provider_credential"] = "my_provider_credential"

    api_key: SecretStr = Field(description="API key for My Provider.")
    base_url: str = Field(default="https://api.myprovider.com/v1")

    @classmethod
    def get_chat_model_class(cls) -> Type["ChatModelBase"]:
        from .my_model import MyProviderChatModel
        return MyProviderChatModel

Step 2: Implement the Chat Model

Subclass ChatModelBase, define a Parameters inner class, and implement _call_api:

from typing import Literal, Any, AsyncGenerator
from pydantic import BaseModel, Field
from agentscope.model import ChatModelBase, ChatResponse
from agentscope.message import Msg
from agentscope.tool import ToolChoice
from agentscope.formatter import FormatterBase, OpenAIChatFormatter

class MyProviderChatModel(ChatModelBase):
    class Parameters(BaseModel):
        max_tokens: int | None = Field(default=None, gt=0)
        temperature: float | None = Field(default=None, ge=0, le=2)

    type: Literal["my_provider_chat"] = "my_provider_chat"

    def __init__(
        self,
        credential: "MyProviderCredential",
        model: str,
        parameters: Parameters | None = None,
        stream: bool = True,
        max_retries: int = 3,
        context_size: int = 128000,
        formatter: FormatterBase | None = None,
    ) -> None:
        super().__init__(
            credential=credential,
            model=model,
            parameters=parameters or self.Parameters(),
            stream=stream,
            max_retries=max_retries,
            context_size=context_size,
        )
        # If your API follows the OpenAI format, reuse OpenAIChatFormatter;
        # otherwise implement your own FormatterBase subclass.
        self.formatter = formatter or OpenAIChatFormatter()

    async def _call_api(
        self,
        model_name: str,
        messages: list[Msg],
        tools: list[dict] | None = None,
        tool_choice: ToolChoice | None = None,
        **kwargs: Any,
    ) -> ChatResponse | AsyncGenerator[ChatResponse, None]:
        formatted_messages = await self.formatter.format(messages)
        # Call your provider's API using self.credential.api_key, etc.
        ...

Step 3: Add Model Cards (optional)

Drop YAML files into a _models/ directory next to your model implementation. Each file describes one model — its capabilities (input_types, output_types), limits (context_size, output_size), and any per-model parameter_overrides:

name: my-model-v1
label: My Model V1
status: active
input_types:
  - text/plain
output_types:
  - text/plain
context_size: 128000
output_size: 4096
parameter_overrides:
  max_tokens: {"maximum": 4096}

MyProviderChatModel.list_models() then loads every YAML in that directory. To pull cards from a different location — for example, a registry your application maintains separately — pass custom_yaml_dir:

cards = MyProviderChatModel.list_models(custom_yaml_dir="/path/to/cards")

Integrate with Frontend

What is ModelCard

ModelCard is a declarative description of a model’s capabilities and constraints, designed to drive the frontend — model selectors, parameter forms, and feature toggles can be rendered dynamically without hardcoding any provider-specific knowledge. Each ModelCard contains:

Field	Type	Description
`name`	`str`	Model identifier (e.g. `"claude-sonnet-4-6"`)
`label`	`str`	Human-readable display name (e.g. `"Claude Sonnet 4.6"`)
`status`	`"active" \| "deprecated" \| "sunset"`	Model lifecycle status
`input_types`	`list[str]`	Accepted input MIME types — used by the frontend to filter attachment uploads (e.g. only show an image button when `image/*` is supported)
`output_types`	`list[str]`	Output MIME types the model can produce — advertises capabilities such as a thinking toggle when `application/x-thinking` is present
`context_size`	`int`	Maximum context window in tokens
`output_size`	`int`	Maximum output tokens
`parameter_schema`	`dict`	Final JSON Schema for the parameter form — base schema merged with per-model overrides (see below)
`parameters_overrides`	`dict[str, dict]`	The raw per-model overrides, before merging

input_types and output_types use MIME types to describe modality. Common values:

MIME Type	Meaning
`text/plain`	Text
`application/x-thinking`	Reasoning / chain-of-thought
`image/*` (e.g. `image/png`, `image/jpeg`)	Image
`audio/*` (e.g. `audio/wav`, `audio/mp3`)	Audio
`video/*` (e.g. `video/mp4`)	Video

A typical YAML card for claude-sonnet-4-6:

name: claude-sonnet-4-6
label: Claude Sonnet 4.6
status: active

input_types:
  - text/plain
  - image/jpeg
  - image/png
  - image/gif
  - image/webp

output_types:
  - text/plain
  - application/x-thinking

context_size: 1000000
output_size: 65536

parameter_overrides:
  max_tokens: {"maximum": 65536}

Parameter schema and overrides

The parameter_schema exposed to the frontend is built in two layers:

Base schema — auto-derived from the chat model’s Parameters class via model_json_schema(). This lists every adjustable parameter (temperature, max_tokens, thinking_enable, …) along with its type and the API-wide range.
Per-model overrides — the YAML’s parameter_overrides block is merged on top, field by field.

Overrides matter because adjustable ranges are not uniform across an API: every Qwen model accepts max_tokens, but each one has a different ceiling. Overrides let a card tighten a range, pin a default, or hide a parameter that doesn’t apply.

Override syntax	Effect
`param: { ... }`	Shallow-merge into the base field (e.g. `max_tokens: {maximum: 16384}`)
`param: { hidden: true }`	Hide the parameter from the frontend
`param: null`	Remove the parameter entirely

Retrieve ModelCards

You retrieve model cards by calling list_models() on either the credential class or the model class. Internally, CredentialBase.list_models() delegates to its linked ChatModelBase subclass (obtained via get_chat_model_class()), which loads YAML card definitions from its _models/ directory.

from agentscope.credential import DashScopeCredential
from agentscope.model import AnthropicChatModel

# Via credential class
cards = DashScopeCredential.list_models()

# Or directly on the model class
cards = AnthropicChatModel.list_models()

for card in cards:
    print(f"{card.name}: context={card.context_size}, inputs={card.input_types}")

The credential’s get_chat_model_class() returns the corresponding ChatModelBase subclass, which in turn knows where to find its model card YAML files:

model_cls = DashScopeCredential.get_chat_model_class()  # -> DashScopeChatModel
cards = model_cls.list_models()                          # -> list[ModelCard]

This design allows the frontend to discover available models, their capabilities, and valid parameter ranges — all from a single credential, without any hardcoded provider logic.

TTS

Coming soon — we are migrating TTS support from v1.0 to v2.0.

Embedding

Coming soon — we are migrating Embedding support from v1.0 to v2.0.

Realtime Model

Coming soon — we are migrating Realtime Model support from v1.0 to v2.0.

Documentation Index

​Overview

​Chat Model

​Create Chat Model

​Call Chat Model

​Generate Structured Output

​Formatter

​Custom Provider

​Step 1: Define the Credential

​Step 2: Implement the Chat Model

​Step 3: Add Model Cards (optional)

​Integrate with Frontend

​What is ModelCard

​Parameter schema and overrides

​Retrieve ModelCards

​TTS

​Embedding

​Realtime Model

Overview

Chat Model

Create Chat Model

Call Chat Model

Generate Structured Output

Formatter

Custom Provider

Step 1: Define the Credential

Step 2: Implement the Chat Model

Step 3: Add Model Cards (optional)

Integrate with Frontend

What is ModelCard

Parameter schema and overrides

Retrieve ModelCards

TTS

Embedding

Realtime Model