Skip to main content
In AgentScope, RAG is composed of the following independently replaceable modules:
ModuleDescription
ParserSplits a raw file into a list of Section objects, where each Section corresponds to a natural boundary in the source (PDF page, PPTX slide, Markdown heading block, whole image, etc.)
ChunkerCuts Sections into the final Chunks to be indexed; never merges across Sections
Embedding ModelEmbeds a Chunk’s text or multimodal content into a vector
Vector StoreConnects to a vector database, stores Chunk vectors with metadata, and supports retrieval
KnowledgeBase handleBinds together an embedding model, a vector store, and a collection, exposing insert_document / search / list_documents / delete_document as the one-stop entry point
This chapter focuses on using RAG in non-service scenarios — indexing files, retrieving knowledge, and integrating with an agent.
For embedding models and how to configure them, see the Embedding Model chapter; for the service version of RAG (with an HTTP service, file hosting, and distributed indexing), see RAG Service.

Existing Implementations

AgentScope ships out-of-the-box default implementations for every module, all inheriting from base classes so users can easily swap them out:

Parser

ClassDescriptionSupported File Types
TextParserText parser: the entire file is returned as a single Section and split downstream by a chunkertext/plain
text/markdown
text/csv
text/html
text/x-rst
application/json
application/xml
application/x-yaml
PDFParserPDF parser, one Section per page; the metadata carries a page field that starts at 1.application/pdf
PPTParserPowerPoint (.pptx) parser, walks slides in order:
- text/tables are merged into the same Section,
- images are read as standalone DataBlocks.
The metadata carries a slide field that starts at 1.
application/vnd.openxmlformats-officedocument.presentationml.presentation
ImageParserImage parser, reads the entire image as a single Sectionimage/png
image/jpeg
image/gif
image/bmp
image/webp
In development …
The PDF and PPT parsers depend on additional third-party libraries; install them in one shot with pip install agentscope[rag].

Chunker

ClassDescription
ApproxTokenChunkerSplits text by approximate token count, without depending on any tokenizer.
Approximation strategy: len(text.encode("utf-8")) // 4; multimodal DataBlocks pass through unchanged.
In development …

Embedding Model

See the Embedding Model chapter.

Vector Database

ClassDescription
QdrantStoreQdrant-based vector database implementation, supporting in-memory (location=":memory:"), local disk (path=...), and remote service (url=...) deployments
In development …

Using RAG

AgentScope recommends going through the KnowledgeBase handle as the entry point for RAG. It binds an embedding model, a vector store, a collection (and an optional metadata_filter for multi-tenant isolation) together and exposes only four operations:
MethodDescription
insert_document(chunks, document_id=None, document_metadata=None)Embeds and writes a batch of Chunks as a single document; returns the document_id
search(queries, top_k=5, score_threshold=None)Runs vector retrieval over a list of queries (str / TextBlock / DataBlock), with automatic deduplication and sorting
delete_document(document_id)Removes every chunk of one document by its document_id
list_documents()Returns a list of DocumentSummary entries for every document in this knowledge base

Indexing a File

Indexing a file goes through three steps — file parsing → chunking → embedding + insertion — one per module. The end-to-end flow:
1

Parse the file

Call the parser’s parse method to read the raw file into a list of Sections, where each Section corresponds to a natural boundary in the source (PDF page / PPT slide / image …).The file parameter of parse(file, filename) accepts both bytes and str:
  • bytes is treated as the raw file content;
  • str in a binary parser (PDFParser / PPTParser / ImageParser) is a filesystem path that the parser reads from disk for you;
  • str in TextParser is disambiguated at runtime — if the string names an existing file it is read and decoded with the configured encoding; otherwise it is used verbatim as pre-decoded text.
from agentscope.rag import TextParser

parser = TextParser()

# 1) Pass bytes directly
sections = await parser.parse(
    file=b"# Cats\nCats sleep 12-16 hours per day.\n",
    filename="cats.md",
)

# 2) Pass a file path (an existing file is read from disk)
sections = await parser.parse(file="./cats.md", filename="cats.md")
2

Split into Chunks

Call the chunker’s chunk method to turn the Section list into the final Chunk list to be indexed. Conventions: never merge across Sections; multimodal DataBlocks pass through as whole chunks; chunk_index runs consecutively from 0; every chunk carries the same total_chunks.
from agentscope.rag import ApproxTokenChunker

chunker = ApproxTokenChunker(chunk_size=256, overlap=32)
chunks = await chunker.chunk(sections)
3

Write to the knowledge base

Construct a KnowledgeBase handle and write the chunk list — embedding and storage are taken care of by the handle. All chunks of the same document share one document_id, which makes whole-document deletion easy.
from agentscope.credential import DashScopeCredential
from agentscope.embedding import DashScopeEmbeddingModel
from agentscope.rag import KnowledgeBase, QdrantStore

embedding_model = DashScopeEmbeddingModel(
    credential=DashScopeCredential(api_key="YOUR_API_KEY"),
    model="text-embedding-v4",
    dimensions=1024,
)

# QdrantStore is an async context manager; entering it opens the client connection
store = QdrantStore(location=":memory:")  # or url="http://..." for a real cluster

async with store:
    knowledge = KnowledgeBase(
        name="demo-kb",
        description="A toy corpus.",
        embedding_model=embedding_model,
        vector_store=store,
        collection="demo-kb",
    )
    # The backing collection is created on first use, sized to embedding_model.dimensions
    document_id = await knowledge.insert_document(
        chunks,
        document_metadata={"filename": "cats.md"},
    )
KnowledgeBase does not open or close the vector store connection itself; enter the VectorStoreBase instance in an async with block before using it.

Vector Retrieval

Call KnowledgeBase.search directly with a list of query strings / TextBlocks / DataBlocks — no manual embedding required:
async with store:
    results = await knowledge.search(
        queries=["When do cats sleep?"],
        top_k=3,
        score_threshold=None,  # only meaningful for cosine / dot-product
    )
    for r in results:
        print(r.score, r.document_id, r.chunk.content)
search does the following internally:
  1. Drops unusable queries: when the bound embedding model’s supports_multimodal == False, DataBlock queries are silently dropped.
  2. Batched embedding: every query is embedded in a single batch, then the collection is searched concurrently.
  3. Deduplication: hits are deduplicated by (document_id, chunk_index) keeping the highest score.
  4. Truncation: results are sorted by descending score and truncated to top_k.
The return value is a list of VectorSearchResults; each entry carries score, document_id, and the matched chunk.

Document Management

KnowledgeBase exposes two document-level helpers:
# List every document (one DocumentSummary per document)
summaries = await knowledge.list_documents()
for s in summaries:
    print(s.document_id, s.source, s.chunk_count, s.metadata)

# Delete every chunk belonging to one document
await knowledge.delete_document(document_id)
DocumentSummary carries the document_id, the original filename source, chunk_count, and the metadata recorded on the first chunk by the parser / uploader.

Multi-tenant Isolation: metadata_filter

When multiple logical knowledge bases need to share one physical collection, pass a metadata_filter when constructing the KnowledgeBase (a typical pattern is stamping every record with {"tenant_id": "..."}):
knowledge = KnowledgeBase(
    name="tenant-a-kb",
    description="...",
    embedding_model=embedding_model,
    vector_store=store,
    collection="shared",
    metadata_filter={"tenant_id": "tenant-a"},
)
metadata_filter is a defense-in-depth mechanism:
  • search and list_documents restrict records to those matching every key == value pair — nothing ever escapes the scope.
  • insert_document forces the same metadata fields onto every chunk, so even a buggy or malicious parser cannot rebind a record into another scope.
None (the default) disables filtering — appropriate for deployments where every knowledge base owns its own collection outright.

Multimodal Support

AgentScope’s RAG natively supports the ingestion and retrieval of multimodal data — the key is matching the parser’s and the embedding model’s capabilities: the former must be able to parse multimodal files into DataBlocks, the latter must be able to embed DataBlocks directly.
  • Check which file types a Parser supports: every ParserBase subclass declares its capability via the class attribute supported_media_types (a list of IANA media types), which you can read directly or auto-complete in your IDE.
    >>> from agentscope.rag import TextParser, ImageParser
    >>> TextParser.supported_media_types
    ['text/plain', 'text/markdown', 'text/csv', 'text/html', 'text/x-rst',
     'application/json', 'application/xml', 'application/x-yaml']
    >>> ImageParser.supported_media_types
    ['image/png', 'image/jpeg', 'image/gif', 'image/bmp', 'image/webp']
    
  • Check which modalities an embedding model supports: the instance attribute embedding_model.supports_multimodal tells whether the model can directly handle DataBlocks (images / video / audio).
    >>> embedding_model.supports_multimodal
    True
    
When the parser yields Chunks containing multimodal content and embedding_model.supports_multimodal == True, the ingestion and retrieval pipelines work without any extra configuration. Text-only models silently drop DataBlock queries inside KnowledgeBase.search instead of raising.

Integrating with an Agent

RAGMiddleware plugs retrieval into the Agent class’s reasoning-acting loop. The middleware does not own the embedding model or the vector store — it consumes a list of pre-built KnowledgeBase handles, which may mix knowledge bases that use different embedding models. RAGMiddleware supports two working modes (RAGMiddleware.Parameters.mode), which can be used individually or combined (by attaching two instances with different modes):
ModeTriggerRetrieval QueryInjection
"static"Before the first reasoning step of each reply (agent.state.cur_iter == 0)The input message of the reply method is used as the retrieval queryRetrieval results are wrapped into a HintBlock and injected into the context
"agentic" (default)The model invokes the retrieval tool on its ownDecided by the model itselfExposes a search_knowledge tool — the agent decides when to retrieve and what query to use
All parameters are wrapped in the nested RAGMiddleware.Parameters model:
FieldDefaultDescription
mode"agentic"Integration mode, see above
top_k5Maximum number of hits returned in one search, deduplicated across knowledge bases and query inputs before truncation
score_thresholdNoneMinimum similarity threshold; only meaningful under cosine / dot-product
emit_hint_eventTrueIn static mode, whether to additionally emit a HintBlockEvent so the frontend can display the matched snippets
persist_hintFalseIn static mode, whether the injected block stays persistently in the context (it is removed after reasoning by default, to avoid polluting the next turn)
In addition, in agentic mode RAGMiddleware.list_tools() returns a single search_knowledge tool — you must manually register it in the agent’s Toolkit so the model can call it. The tool’s description automatically lists the name / description of every attached knowledge base; the model can also restrict a search to a subset via the knowledge_bases=[...] argument. Configure RAG on an agent instance with the following code:
from agentscope.middleware import RAGMiddleware
from agentscope.tool import Toolkit

static_mw = RAGMiddleware(
    knowledges=[knowledge],            # One or more KnowledgeBase handles
    parameters=RAGMiddleware.Parameters(
        mode="static",
        top_k=3,
        emit_hint_event=False,
    ),
)

agent = Agent(
    name="static-agent",
    system_prompt="Answer the user's question using the retrieved material.",
    model=chat_model,
    toolkit=Toolkit(),
    middlewares=[static_mw],
)

Custom Extensions

All RAG modules use base-class inheritance, so users can customize Parser, Chunker, Embedding Model, and Vector Store — inherit from the corresponding base class, implement its core methods, and the custom class slots seamlessly into the pipeline above.
Contributions of new Parsers, Chunkers, and Vector Stores to the official AgentScope repository are welcome!

Custom Parser

Inherit from ParserBase, declare the IANA media types you can handle in the class attribute supported_media_types, and implement async def parse(file, filename) to split a byte stream into a list of Sections:
from agentscope.message import TextBlock
from agentscope.rag import ParserBase, Section


class MyMarkdownParser(ParserBase):
    supported_media_types = ["text/markdown"]

    async def parse(
        self,
        file: bytes | str,
        filename: str,
    ) -> list[Section]:
        text = file.decode("utf-8") if isinstance(file, bytes) else file
        # Split by H2 headings into multiple Sections, preserving the source
        return [
            Section(
                content=TextBlock(text=block),
                source=filename,
                metadata={"index": index},
            )
            for index, block in enumerate(text.split("\n## "))
        ]
You may also override supported_extensions() if needed (the default reverse-lookup from supported_media_types produces noisy developer extensions; override explicitly when you want the front-end file picker to show only a curated set).

Custom Chunker

Inherit from ChunkerBase and implement async def chunk(sections) to turn a list of Sections into the Chunks to be indexed. Conventions: never merge across Sections; multimodal DataBlocks pass through as whole chunks; chunk_index runs consecutively from 0 across the result list; total_chunks stays consistent on every chunk:
from agentscope.message import TextBlock
from agentscope.rag import Chunk, ChunkerBase, Section


class FixedCharChunker(ChunkerBase):
    def __init__(self, chunk_size: int = 1000) -> None:
        self._chunk_size = chunk_size

    async def chunk(self, sections: list[Section]) -> list[Chunk]:
        chunks: list[Chunk] = []
        for section in sections:
            # Multimodal content is not split — pass through as a whole chunk
            if not isinstance(section.content, TextBlock):
                chunks.append(
                    Chunk(
                        content=section.content,
                        source=section.source,
                        chunk_index=0,
                        total_chunks=0,
                        metadata=dict(section.metadata),
                    ),
                )
                continue
            text = section.content.text
            for start in range(0, len(text), self._chunk_size):
                chunks.append(
                    Chunk(
                        content=TextBlock(
                            text=text[start : start + self._chunk_size],
                        ),
                        source=section.source,
                        chunk_index=0,
                        total_chunks=0,
                        metadata=dict(section.metadata),
                    ),
                )
        # Renumber consistently
        for index, chunk in enumerate(chunks):
            chunk.chunk_index = index
            chunk.total_chunks = len(chunks)
        return chunks

Custom Vector Database

Inherit from VectorStoreBase, implement create_collection / delete_collection / has_collection / insert / delete / search / list_documents, and manage the underlying connection lifecycle through __aenter__ / __aexit__:
from typing import Any
from agentscope.rag import (
    DocumentSummary,
    VectorRecord,
    VectorSearchResult,
    VectorStoreBase,
)


class MyVectorStore(VectorStoreBase):
    async def __aenter__(self) -> "MyVectorStore":
        self._client = await connect_my_backend(...)
        return self

    async def __aexit__(self, exc_type, exc, tb) -> None:
        await self._client.close()

    async def create_collection(self, name: str, dimensions: int) -> None: ...

    async def delete_collection(self, name: str) -> None: ...

    async def has_collection(self, name: str) -> bool: ...

    async def insert(
        self,
        collection: str,
        records: list[VectorRecord],
    ) -> None: ...

    async def delete(self, collection: str, document_id: str) -> None: ...

    async def search(
        self,
        collection: str,
        query_vector: list[float],
        top_k: int = 5,
        metadata_filter: dict[str, Any] | None = None,
    ) -> list[VectorSearchResult]: ...

    async def list_documents(
        self,
        collection: str,
        metadata_filter: dict[str, Any] | None = None,
    ) -> list[DocumentSummary]: ...
Implementation notes:
  • delete removes every record belonging to a document_id; callers add and remove documents as a unit.
  • search and list_documents must translate metadata_filter into a backend-native payload filter so multi-tenant isolation works.
  • insert must persist both VectorRecord.document_id and the chunk — otherwise delete and list_documents cannot work.

Further Reading

RAG Service

A multi-tenant, distributed RAG service with HTTP API, file hosting, and managed vector databases.

Middleware

See how RAGMiddleware plugs into the reply / reasoning hooks.

Embedding Model

Available embedding models and their parameters.