Skip to main content
AgentScope provides unified async abstractions for various AI models across different providers:

Chat Models

Core text generation with reasoning, streaming, and tools API support.

TTS Models

Convert text to speech with realtime and non-realtime options.

Realtime Models

Bidirectional WebSocket streaming for low-latency voice agents.

Embedding Models

Generate vector representations for retrieval and similarity search.

Chat Model

Chat models are the core of the agent, enabling it to generate streaming/non-streaming responses, perform reasoning, and call tools.
The streaming mode in AgentScope chat models is accumulative — each yielded response contains all content generated so far, not just the latest delta. This design simplifies consumption since you always have the complete current state without tracking deltas.
API ProviderClassDescription
OpenAIOpenAIChatModelThe OpenAI-compatible chat model, supporting OpenAI, Amazon OpenAI, vLLM, DeepSeek, and any model with an OpenAI-compatible API.
DashScopeDashScopeChatModelThe unified DashScope API that supports both chat models and multimodal models (e.g. qwen-vl, qwen3.5-plus).
AnthropicAnthropicChatModelAnthropic Claude models, supporting both chat and multimodal models (e.g. claude-2, claude-instant-100k).
GeminiGeminiChatModelGoogle Gemini models.
OllamaOllamaChatModelOllama’s local LLM hosting solution.
To support multi-agent conversations in a chatbot format, AgentScope designs a formatter layer that
  • converts AgentScope’s Msg objects into the expected input format for each LLM API, and
  • adopts multi-agent conversation context into the two-role chatbot format by prefixing messages with agent names and wrapping them in <history> tags.
Such formatters are distinguished by the suffix ChatFormatter (e.g., DashScopeChatFormatter) and MultiAgentFormatter (e.g., DashScopeMultiAgentFormatter) — the former is for two-party conversations (user + assistant), while the latter is for multi-agent conversations.
For detailed usage examples and a full provider reference table mapping each model class to its corresponding formatter, see Models.

TTS Model

TTS (Text-to-Speech) models convert text into audio. AgentScope supports both non-realtime and realtime TTS models:
API ProviderClassDescription
DashScopeDashScopeTTSModelNon-realtime TTS
DashScopeDashScopeRealtimeTTSModelRealtime TTS with streaming text input for minimal latency
DashScope CosyVoiceDashScopeCosyVoiceTTSModelNon-realtime TTS with enhanced expressiveness and naturalness via DashScope’s CosyVoice technology
DashScope CosyVoiceDashScopeCosyVoiceRealtimeTTSModelRealtime TTS with CosyVoice technology for the most natural and expressive speech synthesis
OpenAIOpenAITTSModelOpenAI’s TTS model, supporting high-quality speech synthesis with various voice options.
GeminiGeminiTTSModelGoogle Gemini’s TTS model, offering natural and expressive speech synthesis.

Realtime Model

Realtime models provide bidirectional, persistent communication over WebSocket, designed primarily for voice agent scenarios where the user speaks and the model responds with speech in real-time.
API ProviderClassDescription
OpenAIOpenAIRealtimeModelOpenAI’s realtime model, supporting audio and text input, tool use, and server-side VAD for voice activity detection.
DashScopeDashScopeRealtimeModelDashScope’s realtime model, supporting audio and image input for rich multimodal interactions.
GeminiGeminiRealtimeModelGoogle’s Gemini realtime model, supporting audio, text, image input, tool use, and server-side VAD for voice activity detection.

Embedding Model

Embedding models generate vector representations for text, images, and other data types. These embeddings are used for retrieval, similarity search, and as input features for downstream tasks.
API ProviderClassDescription
OpenAIOpenAITextEmbeddingOpenAI’s text embedding API.
DashScopeDashScopeTextEmbeddingDashScope’s text embedding model.
DashScopeMultiModalEmbeddingDashScope’s multimodal embedding model, generating unified embeddings for both text and images, enabling cross-modal retrieval and understanding.
GeminiGeminiTextEmbeddingGoogle Gemini’s text embedding model.
OllamaOllamaTextEmbeddingOllama’s local embedding model for text data.