Documentation Index
Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The model layer is organized as a two-tier hierarchy: a Credential at the top, and the model families a provider exposes beneath it — Chat Model, TTS, Embedding, and Realtime Model.Credential
ChatModelBase
OpenAIChatModel
AnthropicChatModel
DashScopeChatModel
...
TTSModelBase (coming soon)
EmbeddingModelBase (coming soon)
RealtimeModelBase (coming soon)
api_key, base_url, …). From a credential, you can retrieve the list of available models for each model family that provider supports.
This layering mirrors the natural frontend flow — register a credential first, then pick a model from under it — letting the UI authenticate once and surface every model family the provider supports.
Chat Model
A Chat Model is the LLM that drives an agent’s conversation and tool calls, accepting and producing multimodal content beyond plain text. AgentScope currently ships the following chat model classes:| Provider | Model Class | Highlights |
|---|---|---|
| OpenAI | OpenAIChatModel | Chat Completions API, compatible with vLLM and OpenAI-compatible endpoints |
| OpenAI (Responses) | OpenAIResponseModel | Responses API with native reasoning support (o3, o4-mini) |
| Anthropic | AnthropicChatModel | Claude models with extended thinking and prompt caching |
| DashScope | DashScopeChatModel | Qwen models, multimodal (vision/audio/video), reasoning |
| DeepSeek | DeepSeekChatModel | OpenAI-compatible with prompt cache hit tokens |
| Gemini | GeminiChatModel | Google Gemini models with multimodal support |
| Moonshot | MoonshotChatModel | Kimi models (OpenAI-compatible) |
| xAI | XAIChatModel | Grok models with native reasoning effort |
| Ollama | OllamaChatModel | Local LLM hosting, credential is optional |
Create Chat Model
Every chat model takes a credential, a model name, and an optional provider-specificParameters object. The three tabs below show typical setups for streaming, tool calling, and reasoning:
| Argument | Type | Description |
|---|---|---|
credential | CredentialBase | Provider-specific credential |
model | str | Model identifier (e.g. "qwen-plus") |
parameters | Parameters | None | Provider-specific parameters such as temperature, thinking_enable, parallel_tool_calls |
stream | bool | Whether to stream output |
max_retries | int | Maximum API retries on failure |
context_size | int | Context window used for context compression |
formatter | FormatterBase | None | Override message formatter |
Call Chat Model
Invoke the model by calling it with a list ofMsg objects, plus optional tools and tool_choice:
stream setting:
stream=False— awaits a singleChatResponsecarrying the full output.stream=True— awaits anAsyncGenerator[ChatResponse, None]. Intermediate chunks (is_last=False) carry only the delta generated in that step. So that callers don’t have to accumulate deltas themselves, AgentScope appends one final chunk withis_last=Truethat carries the full accumulated content.
ChatResponse carries content blocks (TextBlock, ThinkingBlock, ToolCallBlock, DataBlock), an is_last flag, and a ChatUsage recording token counts and elapsed time.
Generate Structured Output
When you need a JSON object that conforms to a Pydantic model or JSON schema, callgenerate_structured_output instead of __call__. It returns a StructuredResponse whose content is a validated dict matching the schema:
generate_structured_output synthesizes a forced tool call from the schema, then validates and repairs the model’s response.Formatter
A formatter translates AgentScope’sMsg objects into the list[dict] payload that each provider’s API expects. It is configured via the optional formatter argument on the chat model constructor. Every provider ships two built-in variants:
| Variant | Use Case |
|---|---|
| ChatFormatter (default) | Standard single-agent dialog. Each Msg maps 1:1 to an API message, preserving native roles (user, assistant, system). |
| MultiAgentFormatter | Multi-agent scenarios such as debate or moderation. Consecutive agent messages are grouped and wrapped in <history> tags with the sender’s name, while tool call / result sequences keep their native API format. |
FormatterBase and pass an instance through the same formatter argument.
Custom Provider
You can extend AgentScope with your own model provider by implementing a credential and a chat model, then registering the credential.Step 1: Define the Credential
SubclassCredentialBase with a unique type discriminator and implement get_chat_model_class():
Step 2: Implement the Chat Model
SubclassChatModelBase, define a Parameters inner class, and implement _call_api:
Step 3: Add Model Cards (optional)
Drop YAML files into a_models/ directory next to your model implementation. Each file describes one model — its capabilities (input_types, output_types), limits (context_size, output_size), and any per-model parameter_overrides:
MyProviderChatModel.list_models() then loads every YAML in that directory. To pull cards from a different location — for example, a registry your application maintains separately — pass custom_yaml_dir:
Integrate with Frontend
What is ModelCard
ModelCard is a declarative description of a model’s capabilities and constraints, designed to drive the frontend — model selectors, parameter forms, and feature toggles can be rendered dynamically without hardcoding any provider-specific knowledge.
Each ModelCard contains:
| Field | Type | Description |
|---|---|---|
name | str | Model identifier (e.g. "claude-sonnet-4-6") |
label | str | Human-readable display name (e.g. "Claude Sonnet 4.6") |
status | "active" | "deprecated" | "sunset" | Model lifecycle status |
input_types | list[str] | Accepted input MIME types — used by the frontend to filter attachment uploads (e.g. only show an image button when image/* is supported) |
output_types | list[str] | Output MIME types the model can produce — advertises capabilities such as a thinking toggle when application/x-thinking is present |
context_size | int | Maximum context window in tokens |
output_size | int | Maximum output tokens |
parameter_schema | dict | Final JSON Schema for the parameter form — base schema merged with per-model overrides (see below) |
parameters_overrides | dict[str, dict] | The raw per-model overrides, before merging |
input_types and output_types use MIME types to describe modality. Common values:
| MIME Type | Meaning |
|---|---|
text/plain | Text |
application/x-thinking | Reasoning / chain-of-thought |
image/* (e.g. image/png, image/jpeg) | Image |
audio/* (e.g. audio/wav, audio/mp3) | Audio |
video/* (e.g. video/mp4) | Video |
claude-sonnet-4-6:
Parameter schema and overrides
Theparameter_schema exposed to the frontend is built in two layers:
- Base schema — auto-derived from the chat model’s
Parametersclass viamodel_json_schema(). This lists every adjustable parameter (temperature,max_tokens,thinking_enable, …) along with its type and the API-wide range. - Per-model overrides — the YAML’s
parameter_overridesblock is merged on top, field by field.
max_tokens, but each one has a different ceiling. Overrides let a card tighten a range, pin a default, or hide a parameter that doesn’t apply.
| Override syntax | Effect |
|---|---|
param: { ... } | Shallow-merge into the base field (e.g. max_tokens: {maximum: 16384}) |
param: { hidden: true } | Hide the parameter from the frontend |
param: null | Remove the parameter entirely |
Retrieve ModelCards
You retrieve model cards by callinglist_models() on either the credential class or the model class. Internally, CredentialBase.list_models() delegates to its linked ChatModelBase subclass (obtained via get_chat_model_class()), which loads YAML card definitions from its _models/ directory.
get_chat_model_class() returns the corresponding ChatModelBase subclass, which in turn knows where to find its model card YAML files: