AgentScope provides unified async abstractions for various AI models across different providers:Documentation Index
Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt
Use this file to discover all available pages before exploring further.
Chat Models
Core text generation with reasoning, streaming, and tools API support.
TTS Models
Convert text to speech with realtime and non-realtime options.
Realtime Models
Bidirectional WebSocket streaming for low-latency voice agents.
Embedding Models
Generate vector representations for retrieval and similarity search.
Chat Model
Chat models are the core of the agent, enabling it to generate streaming/non-streaming responses, perform reasoning, and call tools.The streaming mode in AgentScope chat models is accumulative — each yielded response contains all content generated so far, not just the latest delta. This design simplifies consumption since you always have the complete current state without tracking deltas.
| API Provider | Class | Description |
|---|---|---|
| OpenAI | OpenAIChatModel | The OpenAI-compatible chat model, supporting OpenAI, Amazon OpenAI, vLLM, DeepSeek, and any model with an OpenAI-compatible API. |
| DashScope | DashScopeChatModel | The unified DashScope API that supports both chat models and multimodal models (e.g. qwen-vl, qwen3.5-plus). |
| Anthropic | AnthropicChatModel | Anthropic Claude models, supporting both chat and multimodal models (e.g. claude-2, claude-instant-100k). |
| Gemini | GeminiChatModel | Google Gemini models. |
| Ollama | OllamaChatModel | Ollama’s local LLM hosting solution. |
- converts AgentScope’s
Msgobjects into the expected input format for each LLM API, and - adopts multi-agent conversation context into the two-role chatbot format by prefixing messages with agent names and wrapping them in
<history>tags.
ChatFormatter (e.g., DashScopeChatFormatter) and MultiAgentFormatter (e.g., DashScopeMultiAgentFormatter) — the former is for two-party conversations (user + assistant), while the latter is for multi-agent conversations.
TTS Model
TTS (Text-to-Speech) models convert text into audio. AgentScope supports both non-realtime and realtime TTS models:| API Provider | Class | Description |
|---|---|---|
| DashScope | DashScopeTTSModel | Non-realtime TTS |
| DashScope | DashScopeRealtimeTTSModel | Realtime TTS with streaming text input for minimal latency |
| DashScope CosyVoice | DashScopeCosyVoiceTTSModel | Non-realtime TTS with enhanced expressiveness and naturalness via DashScope’s CosyVoice technology |
| DashScope CosyVoice | DashScopeCosyVoiceRealtimeTTSModel | Realtime TTS with CosyVoice technology for the most natural and expressive speech synthesis |
| OpenAI | OpenAITTSModel | OpenAI’s TTS model, supporting high-quality speech synthesis with various voice options. |
| Gemini | GeminiTTSModel | Google Gemini’s TTS model, offering natural and expressive speech synthesis. |
Realtime Model
Realtime models provide bidirectional, persistent communication over WebSocket, designed primarily for voice agent scenarios where the user speaks and the model responds with speech in real-time.| API Provider | Class | Description |
|---|---|---|
| OpenAI | OpenAIRealtimeModel | OpenAI’s realtime model, supporting audio and text input, tool use, and server-side VAD for voice activity detection. |
| DashScope | DashScopeRealtimeModel | DashScope’s realtime model, supporting audio and image input for rich multimodal interactions. |
| Gemini | GeminiRealtimeModel | Google’s Gemini realtime model, supporting audio, text, image input, tool use, and server-side VAD for voice activity detection. |
Embedding Model
Embedding models generate vector representations for text, images, and other data types. These embeddings are used for retrieval, similarity search, and as input features for downstream tasks.| API Provider | Class | Description |
|---|---|---|
| OpenAI | OpenAITextEmbedding | OpenAI’s text embedding API. |
| DashScope | DashScopeTextEmbedding | DashScope’s text embedding model. |
DashScopeMultiModalEmbedding | DashScope’s multimodal embedding model, generating unified embeddings for both text and images, enabling cross-modal retrieval and understanding. | |
| Gemini | GeminiTextEmbedding | Google Gemini’s text embedding model. |
| Ollama | OllamaTextEmbedding | Ollama’s local embedding model for text data. |