RAG - AgentScope

In AgentScope, RAG is composed of the following independently replaceable modules:

Module	Description
Parser	Splits a raw file into a list of `Section` objects, where each `Section` corresponds to a natural boundary in the source (PDF page, PPTX slide, Markdown heading block, whole image, etc.)
Chunker	Cuts `Section`s into the final `Chunk`s to be indexed; never merges across `Section`s
Embedding Model	Embeds a `Chunk`’s text or multimodal content into a vector
Vector Store	Connects to a vector database, stores `Chunk` vectors with metadata, and supports retrieval
KnowledgeBase handle	Binds together an embedding model, a vector store, and a collection, exposing `insert_document` / `search` / `list_documents` / `delete_document` as the one-stop entry point

This chapter focuses on using RAG in non-service scenarios — indexing files, retrieving knowledge, and integrating with an agent.

For embedding models and how to configure them, see the Embedding Model chapter; for the service version of RAG (with an HTTP service, file hosting, and distributed indexing), see RAG Service.

Existing Implementations

AgentScope ships out-of-the-box default implementations for every module, all inheriting from base classes so users can easily swap them out:

Parser

Class	Description	Supported File Types
`TextParser`	Text parser: the entire file is returned as a single `Section` and split downstream by a chunker	`text/plain` `text/markdown` `text/csv` `text/html` `text/x-rst` `application/json` `application/xml` `application/x-yaml`
`PDFParser`	PDF parser, one `Section` per page; the metadata carries a `page` field that starts at 1.	`application/pdf`
`PPTParser`	PowerPoint (`.pptx`) parser, walks slides in order: - text/tables are merged into the same `Section`, - images are read as standalone `DataBlock`s. The metadata carries a `slide` field that starts at 1.	`application/vnd.openxmlformats-officedocument.presentationml.presentation`
`ImageParser`	Image parser, reads the entire image as a single `Section`	`image/png` `image/jpeg` `image/gif` `image/bmp` `image/webp`
In development …

The PDF and PPT parsers depend on additional third-party libraries; install them in one shot with pip install agentscope[rag].

Chunker

Class	Description
`ApproxTokenChunker`	Splits text by approximate token count, without depending on any tokenizer. Approximation strategy: `len(text.encode("utf-8")) // 4`; multimodal `DataBlock`s pass through unchanged.
In development …

Embedding Model

See the Embedding Model chapter.

Vector Database

Class	Description
`QdrantStore`	Qdrant-based vector database implementation, supporting in-memory (`location=":memory:"`), local disk (`path=...`), and remote service (`url=...`) deployments
In development …

Using RAG

AgentScope recommends going through the KnowledgeBase handle as the entry point for RAG. It binds an embedding model, a vector store, a collection (and an optional metadata_filter for multi-tenant isolation) together and exposes only four operations:

Method	Description
`insert_document(chunks, document_id=None, document_metadata=None)`	Embeds and writes a batch of `Chunk`s as a single document; returns the `document_id`
`search(queries, top_k=5, score_threshold=None)`	Runs vector retrieval over a list of queries (`str` / `TextBlock` / `DataBlock`), with automatic deduplication and sorting
`delete_document(document_id)`	Removes every chunk of one document by its `document_id`
`list_documents()`	Returns a list of `DocumentSummary` entries for every document in this knowledge base

Indexing a File

Indexing a file goes through three steps — file parsing → chunking → embedding + insertion — one per module. The end-to-end flow:

Parse the file

Call the parser’s parse method to read the raw file into a list of Sections, where each Section corresponds to a natural boundary in the source (PDF page / PPT slide / image …).The file parameter of parse(file, filename) accepts both bytes and str:

bytes is treated as the raw file content;
str in a binary parser (PDFParser / PPTParser / ImageParser) is a filesystem path that the parser reads from disk for you;
str in TextParser is disambiguated at runtime — if the string names an existing file it is read and decoded with the configured encoding; otherwise it is used verbatim as pre-decoded text.

from agentscope.rag import TextParser

parser = TextParser()

# 1) Pass bytes directly
sections = await parser.parse(
    file=b"# Cats\nCats sleep 12-16 hours per day.\n",
    filename="cats.md",
)

# 2) Pass a file path (an existing file is read from disk)
sections = await parser.parse(file="./cats.md", filename="cats.md")

Split into Chunks

Call the chunker’s chunk method to turn the Section list into the final Chunk list to be indexed. Conventions: never merge across Sections; multimodal DataBlocks pass through as whole chunks; chunk_index runs consecutively from 0; every chunk carries the same total_chunks.

from agentscope.rag import ApproxTokenChunker

chunker = ApproxTokenChunker(chunk_size=256, overlap=32)
chunks = await chunker.chunk(sections)

Write to the knowledge base

Construct a KnowledgeBase handle and write the chunk list — embedding and storage are taken care of by the handle. All chunks of the same document share one document_id, which makes whole-document deletion easy.

from agentscope.credential import DashScopeCredential
from agentscope.embedding import DashScopeEmbeddingModel
from agentscope.rag import KnowledgeBase, QdrantStore

embedding_model = DashScopeEmbeddingModel(
    credential=DashScopeCredential(api_key="YOUR_API_KEY"),
    model="text-embedding-v4",
    dimensions=1024,
)

# QdrantStore is an async context manager; entering it opens the client connection
store = QdrantStore(location=":memory:")  # or url="http://..." for a real cluster

async with store:
    knowledge = KnowledgeBase(
        name="demo-kb",
        description="A toy corpus.",
        embedding_model=embedding_model,
        vector_store=store,
        collection="demo-kb",
    )
    # The backing collection is created on first use, sized to embedding_model.dimensions
    document_id = await knowledge.insert_document(
        chunks,
        document_metadata={"filename": "cats.md"},
    )

KnowledgeBase does not open or close the vector store connection itself; enter the VectorStoreBase instance in an async with block before using it.

Vector Retrieval

Call KnowledgeBase.search directly with a list of query strings / TextBlocks / DataBlocks — no manual embedding required:

async with store:
    results = await knowledge.search(
        queries=["When do cats sleep?"],
        top_k=3,
        score_threshold=None,  # only meaningful for cosine / dot-product
    )
    for r in results:
        print(r.score, r.document_id, r.chunk.content)

search does the following internally:

Drops unusable queries: when the bound embedding model’s supports_multimodal == False, DataBlock queries are silently dropped.
Batched embedding: every query is embedded in a single batch, then the collection is searched concurrently.
Deduplication: hits are deduplicated by (document_id, chunk_index) keeping the highest score.
Truncation: results are sorted by descending score and truncated to top_k.

The return value is a list of VectorSearchResults; each entry carries score, document_id, and the matched chunk.

Document Management

KnowledgeBase exposes two document-level helpers:

# List every document (one DocumentSummary per document)
summaries = await knowledge.list_documents()
for s in summaries:
    print(s.document_id, s.source, s.chunk_count, s.metadata)

# Delete every chunk belonging to one document
await knowledge.delete_document(document_id)

DocumentSummary carries the document_id, the original filename source, chunk_count, and the metadata recorded on the first chunk by the parser / uploader.

Multi-tenant Isolation: `metadata_filter`

When multiple logical knowledge bases need to share one physical collection, pass a metadata_filter when constructing the KnowledgeBase (a typical pattern is stamping every record with {"tenant_id": "..."}):

knowledge = KnowledgeBase(
    name="tenant-a-kb",
    description="...",
    embedding_model=embedding_model,
    vector_store=store,
    collection="shared",
    metadata_filter={"tenant_id": "tenant-a"},
)

metadata_filter is a defense-in-depth mechanism:

search and list_documents restrict records to those matching every key == value pair — nothing ever escapes the scope.
insert_document forces the same metadata fields onto every chunk, so even a buggy or malicious parser cannot rebind a record into another scope.

None (the default) disables filtering — appropriate for deployments where every knowledge base owns its own collection outright.

Multimodal Support

AgentScope’s RAG natively supports the ingestion and retrieval of multimodal data — the key is matching the parser’s and the embedding model’s capabilities: the former must be able to parse multimodal files into DataBlocks, the latter must be able to embed DataBlocks directly.

Check which file types a Parser supports: every ParserBase subclass declares its capability via the class attribute supported_media_types (a list of IANA media types), which you can read directly or auto-complete in your IDE.

>>> from agentscope.rag import TextParser, ImageParser
>>> TextParser.supported_media_types
['text/plain', 'text/markdown', 'text/csv', 'text/html', 'text/x-rst',
 'application/json', 'application/xml', 'application/x-yaml']
>>> ImageParser.supported_media_types
['image/png', 'image/jpeg', 'image/gif', 'image/bmp', 'image/webp']

Check which modalities an embedding model supports: the instance attribute embedding_model.supports_multimodal tells whether the model can directly handle DataBlocks (images / video / audio).
```
>>> embedding_model.supports_multimodal
True
```

When the parser yields Chunks containing multimodal content and embedding_model.supports_multimodal == True, the ingestion and retrieval pipelines work without any extra configuration. Text-only models silently drop DataBlock queries inside KnowledgeBase.search instead of raising.

Integrating with an Agent

RAGMiddleware plugs retrieval into the Agent class’s reasoning-acting loop. The middleware does not own the embedding model or the vector store — it consumes a list of pre-built KnowledgeBase handles, which may mix knowledge bases that use different embedding models. RAGMiddleware supports two working modes (RAGMiddleware.Parameters.mode), which can be used individually or combined (by attaching two instances with different modes):

Mode	Trigger	Retrieval Query	Injection
`"static"`	Before the first reasoning step of each reply (`agent.state.cur_iter == 0`)	The input message of the reply method is used as the retrieval query	Retrieval results are wrapped into a `HintBlock` and injected into the context
`"agentic"` (default)	The model invokes the retrieval tool on its own	Decided by the model itself	Exposes a `search_knowledge` tool — the agent decides when to retrieve and what query to use

All parameters are wrapped in the nested RAGMiddleware.Parameters model:

Field	Default	Description
`mode`	`"agentic"`	Integration mode, see above
`top_k`	`5`	Maximum number of hits returned in one search, deduplicated across knowledge bases and query inputs before truncation
`score_threshold`	`None`	Minimum similarity threshold; only meaningful under cosine / dot-product
`emit_hint_event`	`True`	In `static` mode, whether to additionally emit a `HintBlockEvent` so the frontend can display the matched snippets
`persist_hint`	`False`	In `static` mode, whether the injected block stays persistently in the context (it is removed after reasoning by default, to avoid polluting the next turn)

In addition, in agentic mode RAGMiddleware.list_tools() returns a single search_knowledge tool — you must manually register it in the agent’s Toolkit so the model can call it. The tool’s description automatically lists the name / description of every attached knowledge base; the model can also restrict a search to a subset via the knowledge_bases=[...] argument. Configure RAG on an agent instance with the following code:

from agentscope.middleware import RAGMiddleware
from agentscope.tool import Toolkit

static_mw = RAGMiddleware(
    knowledges=[knowledge],            # One or more KnowledgeBase handles
    parameters=RAGMiddleware.Parameters(
        mode="static",
        top_k=3,
        emit_hint_event=False,
    ),
)

agent = Agent(
    name="static-agent",
    system_prompt="Answer the user's question using the retrieved material.",
    model=chat_model,
    toolkit=Toolkit(),
    middlewares=[static_mw],
)

Custom Extensions

All RAG modules use base-class inheritance, so users can customize Parser, Chunker, Embedding Model, and Vector Store — inherit from the corresponding base class, implement its core methods, and the custom class slots seamlessly into the pipeline above.

Contributions of new Parsers, Chunkers, and Vector Stores to the official AgentScope repository are welcome!

Custom Parser

Inherit from ParserBase, declare the IANA media types you can handle in the class attribute supported_media_types, and implement async def parse(file, filename) to split a byte stream into a list of Sections:

from agentscope.message import TextBlock
from agentscope.rag import ParserBase, Section


class MyMarkdownParser(ParserBase):
    supported_media_types = ["text/markdown"]

    async def parse(
        self,
        file: bytes | str,
        filename: str,
    ) -> list[Section]:
        text = file.decode("utf-8") if isinstance(file, bytes) else file
        # Split by H2 headings into multiple Sections, preserving the source
        return [
            Section(
                content=TextBlock(text=block),
                source=filename,
                metadata={"index": index},
            )
            for index, block in enumerate(text.split("\n## "))
        ]

You may also override supported_extensions() if needed (the default reverse-lookup from supported_media_types produces noisy developer extensions; override explicitly when you want the front-end file picker to show only a curated set).

Custom Chunker

Inherit from ChunkerBase and implement async def chunk(sections) to turn a list of Sections into the Chunks to be indexed. Conventions: never merge across Sections; multimodal DataBlocks pass through as whole chunks; chunk_index runs consecutively from 0 across the result list; total_chunks stays consistent on every chunk:

from agentscope.message import TextBlock
from agentscope.rag import Chunk, ChunkerBase, Section


class FixedCharChunker(ChunkerBase):
    def __init__(self, chunk_size: int = 1000) -> None:
        self._chunk_size = chunk_size

    async def chunk(self, sections: list[Section]) -> list[Chunk]:
        chunks: list[Chunk] = []
        for section in sections:
            # Multimodal content is not split — pass through as a whole chunk
            if not isinstance(section.content, TextBlock):
                chunks.append(
                    Chunk(
                        content=section.content,
                        source=section.source,
                        chunk_index=0,
                        total_chunks=0,
                        metadata=dict(section.metadata),
                    ),
                )
                continue
            text = section.content.text
            for start in range(0, len(text), self._chunk_size):
                chunks.append(
                    Chunk(
                        content=TextBlock(
                            text=text[start : start + self._chunk_size],
                        ),
                        source=section.source,
                        chunk_index=0,
                        total_chunks=0,
                        metadata=dict(section.metadata),
                    ),
                )
        # Renumber consistently
        for index, chunk in enumerate(chunks):
            chunk.chunk_index = index
            chunk.total_chunks = len(chunks)
        return chunks

Custom Vector Database

Inherit from VectorStoreBase, implement create_collection / delete_collection / has_collection / insert / delete / search / list_documents, and manage the underlying connection lifecycle through __aenter__ / __aexit__:

from typing import Any
from agentscope.rag import (
    DocumentSummary,
    VectorRecord,
    VectorSearchResult,
    VectorStoreBase,
)


class MyVectorStore(VectorStoreBase):
    async def __aenter__(self) -> "MyVectorStore":
        self._client = await connect_my_backend(...)
        return self

    async def __aexit__(self, exc_type, exc, tb) -> None:
        await self._client.close()

    async def create_collection(self, name: str, dimensions: int) -> None: ...

    async def delete_collection(self, name: str) -> None: ...

    async def has_collection(self, name: str) -> bool: ...

    async def insert(
        self,
        collection: str,
        records: list[VectorRecord],
    ) -> None: ...

    async def delete(self, collection: str, document_id: str) -> None: ...

    async def search(
        self,
        collection: str,
        query_vector: list[float],
        top_k: int = 5,
        metadata_filter: dict[str, Any] | None = None,
    ) -> list[VectorSearchResult]: ...

    async def list_documents(
        self,
        collection: str,
        metadata_filter: dict[str, Any] | None = None,
    ) -> list[DocumentSummary]: ...

Implementation notes:

delete removes every record belonging to a document_id; callers add and remove documents as a unit.
search and list_documents must translate metadata_filter into a backend-native payload filter so multi-tenant isolation works.
insert must persist both VectorRecord.document_id and the chunk — otherwise delete and list_documents cannot work.

RAG Service

A multi-tenant, distributed RAG service with HTTP API, file hosting, and managed vector databases.

Middleware

See how RAGMiddleware plugs into the reply / reasoning hooks.

Embedding Model

Available embedding models and their parameters.

​Existing Implementations

​Parser

​Chunker

​Embedding Model

​Vector Database

​Using RAG

​Indexing a File

​Vector Retrieval

​Document Management

​Multi-tenant Isolation: metadata_filter

​Multimodal Support

​Integrating with an Agent

​Custom Extensions

​Custom Parser

​Custom Chunker

​Custom Vector Database

​Further Reading

RAG Service

Middleware

Embedding Model

Existing Implementations

Parser

Chunker

Embedding Model

Vector Database

Using RAG

Indexing a File

Vector Retrieval

Document Management

Multi-tenant Isolation: `metadata_filter`

Multimodal Support

Integrating with an Agent

Custom Extensions

Custom Parser

Custom Chunker

Custom Vector Database

Further Reading