Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The model layer is organized as a two-tier hierarchy: a Credential at the top, and the model families a provider exposes beneath it — Chat Model, TTS, Embedding, and Realtime Model.
Credential
ChatModelBase
OpenAIChatModel
AnthropicChatModel
DashScopeChatModel
...
TTSModelBase (coming soon)
EmbeddingModelBase (coming soon)
RealtimeModelBase (coming soon)
A Credential carries the API authentication fields a provider requires (api_key, base_url, …). From a credential, you can retrieve the list of available models for each model family that provider supports. This layering mirrors the natural frontend flow — register a credential first, then pick a model from under it — letting the UI authenticate once and surface every model family the provider supports.

Chat Model

A Chat Model is the LLM that drives an agent’s conversation and tool calls, accepting and producing multimodal content beyond plain text. AgentScope currently ships the following chat model classes:
ProviderModel ClassHighlights
OpenAIOpenAIChatModelChat Completions API, compatible with vLLM and OpenAI-compatible endpoints
OpenAI (Responses)OpenAIResponseModelResponses API with native reasoning support (o3, o4-mini)
AnthropicAnthropicChatModelClaude models with extended thinking and prompt caching
DashScopeDashScopeChatModelQwen models, multimodal (vision/audio/video), reasoning
DeepSeekDeepSeekChatModelOpenAI-compatible with prompt cache hit tokens
GeminiGeminiChatModelGoogle Gemini models with multimodal support
MoonshotMoonshotChatModelKimi models (OpenAI-compatible)
xAIXAIChatModelGrok models with native reasoning effort
OllamaOllamaChatModelLocal LLM hosting, credential is optional

Create Chat Model

Every chat model takes a credential, a model name, and an optional provider-specific Parameters object. The three tabs below show typical setups for streaming, tool calling, and reasoning:
import os
from agentscope.model import DashScopeChatModel
from agentscope.credential import DashScopeCredential

model = DashScopeChatModel(
    credential=DashScopeCredential(api_key=os.environ["DASHSCOPE_API_KEY"]),
    model="qwen-plus",
    stream=True,
)
Common constructor arguments shared by every chat model:
ArgumentTypeDescription
credentialCredentialBaseProvider-specific credential
modelstrModel identifier (e.g. "qwen-plus")
parametersParameters | NoneProvider-specific parameters such as temperature, thinking_enable, parallel_tool_calls
streamboolWhether to stream output
max_retriesintMaximum API retries on failure
context_sizeintContext window used for context compression
formatterFormatterBase | NoneOverride message formatter

Call Chat Model

Invoke the model by calling it with a list of Msg objects, plus optional tools and tool_choice:
async def __call__(
    self,
    messages: list[Msg],
    tools: list[dict] | None = None,
    tool_choice: ToolChoice | None = None,
    **kwargs: Any,
) -> ChatResponse | AsyncGenerator[ChatResponse, None]:
The return type follows the model’s stream setting:
  • stream=False — awaits a single ChatResponse carrying the full output.
  • stream=True — awaits an AsyncGenerator[ChatResponse, None]. Intermediate chunks (is_last=False) carry only the delta generated in that step. So that callers don’t have to accumulate deltas themselves, AgentScope appends one final chunk with is_last=True that carries the full accumulated content.
import asyncio
import os
from agentscope.model import DashScopeChatModel
from agentscope.credential import DashScopeCredential
from agentscope.message import UserMsg

async def main():
    model = DashScopeChatModel(
        credential=DashScopeCredential(api_key=os.environ["DASHSCOPE_API_KEY"]),
        model="qwen-plus",
        stream=True,
    )
    msgs = [UserMsg(name="user", content="Count from 1 to 5.")]

    async for chunk in await model(msgs):
        if chunk.is_last:
            print("Final:", chunk.content)   # full accumulated content
        else:
            print("Delta:", chunk.content)   # delta only

asyncio.run(main())
A representative streaming trace, illustrating the delta-then-accumulated pattern:
Delta: [TextBlock(text='1')]
Delta: [TextBlock(text=', 2,')]
Delta: [TextBlock(text=' 3, ')]
Delta: [TextBlock(text='4, 5')]
Final: [TextBlock(text='1, 2, 3, 4, 5')]
Each ChatResponse carries content blocks (TextBlock, ThinkingBlock, ToolCallBlock, DataBlock), an is_last flag, and a ChatUsage recording token counts and elapsed time.

Generate Structured Output

When you need a JSON object that conforms to a Pydantic model or JSON schema, call generate_structured_output instead of __call__. It returns a StructuredResponse whose content is a validated dict matching the schema:
import asyncio
import os
from pydantic import BaseModel
from agentscope.model import DashScopeChatModel
from agentscope.credential import DashScopeCredential
from agentscope.message import UserMsg

class WeatherInfo(BaseModel):
    city: str
    temperature: float
    unit: str

async def main():
    model = DashScopeChatModel(
        credential=DashScopeCredential(api_key=os.environ["DASHSCOPE_API_KEY"]),
        model="qwen-plus",
        stream=False,
    )
    response = await model.generate_structured_output(
        messages=[UserMsg(name="user", content="What's the weather in Shanghai?")],
        structured_model=WeatherInfo,
    )
    print(response.content)  # validated dict matching WeatherInfo

asyncio.run(main())
generate_structured_output synthesizes a forced tool call from the schema, then validates and repairs the model’s response.

Formatter

A formatter translates AgentScope’s Msg objects into the list[dict] payload that each provider’s API expects. It is configured via the optional formatter argument on the chat model constructor. Every provider ships two built-in variants:
VariantUse Case
ChatFormatter (default)Standard single-agent dialog. Each Msg maps 1:1 to an API message, preserving native roles (user, assistant, system).
MultiAgentFormatterMulti-agent scenarios such as debate or moderation. Consecutive agent messages are grouped and wrapped in <history> tags with the sender’s name, while tool call / result sequences keep their native API format.
Switch to multi-agent mode by passing the MultiAgent variant — no agent code changes are required:
import os
from agentscope.model import OpenAIChatModel
from agentscope.credential import OpenAICredential
from agentscope.formatter import OpenAIMultiAgentFormatter

model = OpenAIChatModel(
    credential=OpenAICredential(api_key=os.environ["OPENAI_API_KEY"]),
    model="gpt-4.1",
    formatter=OpenAIMultiAgentFormatter(),
)
For non-standard payload shapes (e.g. a provider whose API doesn’t follow the OpenAI or Anthropic conventions), subclass FormatterBase and pass an instance through the same formatter argument.

Custom Provider

You can extend AgentScope with your own model provider by implementing a credential and a chat model, then registering the credential.

Step 1: Define the Credential

Subclass CredentialBase with a unique type discriminator and implement get_chat_model_class():
from typing import Literal, Type, TYPE_CHECKING
from pydantic import ConfigDict, Field, SecretStr
from agentscope.credential import CredentialBase

if TYPE_CHECKING:
    from agentscope.model import ChatModelBase

class MyProviderCredential(CredentialBase):
    model_config = ConfigDict(title="My Provider API")
    type: Literal["my_provider_credential"] = "my_provider_credential"

    api_key: SecretStr = Field(description="API key for My Provider.")
    base_url: str = Field(default="https://api.myprovider.com/v1")

    @classmethod
    def get_chat_model_class(cls) -> Type["ChatModelBase"]:
        from .my_model import MyProviderChatModel
        return MyProviderChatModel

Step 2: Implement the Chat Model

Subclass ChatModelBase, define a Parameters inner class, and implement _call_api:
from typing import Literal, Any, AsyncGenerator
from pydantic import BaseModel, Field
from agentscope.model import ChatModelBase, ChatResponse
from agentscope.message import Msg
from agentscope.tool import ToolChoice
from agentscope.formatter import FormatterBase, OpenAIChatFormatter

class MyProviderChatModel(ChatModelBase):
    class Parameters(BaseModel):
        max_tokens: int | None = Field(default=None, gt=0)
        temperature: float | None = Field(default=None, ge=0, le=2)

    type: Literal["my_provider_chat"] = "my_provider_chat"

    def __init__(
        self,
        credential: "MyProviderCredential",
        model: str,
        parameters: Parameters | None = None,
        stream: bool = True,
        max_retries: int = 3,
        context_size: int = 128000,
        formatter: FormatterBase | None = None,
    ) -> None:
        super().__init__(
            credential=credential,
            model=model,
            parameters=parameters or self.Parameters(),
            stream=stream,
            max_retries=max_retries,
            context_size=context_size,
        )
        # If your API follows the OpenAI format, reuse OpenAIChatFormatter;
        # otherwise implement your own FormatterBase subclass.
        self.formatter = formatter or OpenAIChatFormatter()

    async def _call_api(
        self,
        model_name: str,
        messages: list[Msg],
        tools: list[dict] | None = None,
        tool_choice: ToolChoice | None = None,
        **kwargs: Any,
    ) -> ChatResponse | AsyncGenerator[ChatResponse, None]:
        formatted_messages = await self.formatter.format(messages)
        # Call your provider's API using self.credential.api_key, etc.
        ...

Step 3: Add Model Cards (optional)

Drop YAML files into a _models/ directory next to your model implementation. Each file describes one model — its capabilities (input_types, output_types), limits (context_size, output_size), and any per-model parameter_overrides:
name: my-model-v1
label: My Model V1
status: active
input_types:
  - text/plain
output_types:
  - text/plain
context_size: 128000
output_size: 4096
parameter_overrides:
  max_tokens: {"maximum": 4096}
MyProviderChatModel.list_models() then loads every YAML in that directory. To pull cards from a different location — for example, a registry your application maintains separately — pass custom_yaml_dir:
cards = MyProviderChatModel.list_models(custom_yaml_dir="/path/to/cards")

Integrate with Frontend

What is ModelCard

ModelCard is a declarative description of a model’s capabilities and constraints, designed to drive the frontend — model selectors, parameter forms, and feature toggles can be rendered dynamically without hardcoding any provider-specific knowledge. Each ModelCard contains:
FieldTypeDescription
namestrModel identifier (e.g. "claude-sonnet-4-6")
labelstrHuman-readable display name (e.g. "Claude Sonnet 4.6")
status"active" | "deprecated" | "sunset"Model lifecycle status
input_typeslist[str]Accepted input MIME types — used by the frontend to filter attachment uploads (e.g. only show an image button when image/* is supported)
output_typeslist[str]Output MIME types the model can produce — advertises capabilities such as a thinking toggle when application/x-thinking is present
context_sizeintMaximum context window in tokens
output_sizeintMaximum output tokens
parameter_schemadictFinal JSON Schema for the parameter form — base schema merged with per-model overrides (see below)
parameters_overridesdict[str, dict]The raw per-model overrides, before merging
input_types and output_types use MIME types to describe modality. Common values:
MIME TypeMeaning
text/plainText
application/x-thinkingReasoning / chain-of-thought
image/* (e.g. image/png, image/jpeg)Image
audio/* (e.g. audio/wav, audio/mp3)Audio
video/* (e.g. video/mp4)Video
A typical YAML card for claude-sonnet-4-6:
name: claude-sonnet-4-6
label: Claude Sonnet 4.6
status: active

input_types:
  - text/plain
  - image/jpeg
  - image/png
  - image/gif
  - image/webp

output_types:
  - text/plain
  - application/x-thinking

context_size: 1000000
output_size: 65536

parameter_overrides:
  max_tokens: {"maximum": 65536}

Parameter schema and overrides

The parameter_schema exposed to the frontend is built in two layers:
  1. Base schema — auto-derived from the chat model’s Parameters class via model_json_schema(). This lists every adjustable parameter (temperature, max_tokens, thinking_enable, …) along with its type and the API-wide range.
  2. Per-model overrides — the YAML’s parameter_overrides block is merged on top, field by field.
Overrides matter because adjustable ranges are not uniform across an API: every Qwen model accepts max_tokens, but each one has a different ceiling. Overrides let a card tighten a range, pin a default, or hide a parameter that doesn’t apply.
Override syntaxEffect
param: { ... }Shallow-merge into the base field (e.g. max_tokens: {maximum: 16384})
param: { hidden: true }Hide the parameter from the frontend
param: nullRemove the parameter entirely

Retrieve ModelCards

You retrieve model cards by calling list_models() on either the credential class or the model class. Internally, CredentialBase.list_models() delegates to its linked ChatModelBase subclass (obtained via get_chat_model_class()), which loads YAML card definitions from its _models/ directory.
from agentscope.credential import DashScopeCredential
from agentscope.model import AnthropicChatModel

# Via credential class
cards = DashScopeCredential.list_models()

# Or directly on the model class
cards = AnthropicChatModel.list_models()

for card in cards:
    print(f"{card.name}: context={card.context_size}, inputs={card.input_types}")
The credential’s get_chat_model_class() returns the corresponding ChatModelBase subclass, which in turn knows where to find its model card YAML files:
model_cls = DashScopeCredential.get_chat_model_class()  # -> DashScopeChatModel
cards = model_cls.list_models()                          # -> list[ModelCard]
This design allows the frontend to discover available models, their capabilities, and valid parameter ranges — all from a single credential, without any hardcoded provider logic.

TTS

Coming soon — we are migrating TTS support from v1.0 to v2.0.

Embedding

Coming soon — we are migrating Embedding support from v1.0 to v2.0.

Realtime Model

Coming soon — we are migrating Realtime Model support from v1.0 to v2.0.