> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agentscope.io/llms.txt
> Use this file to discover all available pages before exploring further.

# RAG

> Build retrieval-augmented generation (RAG) capabilities for agents.

In AgentScope, RAG is composed of the following **independently replaceable** modules:

| Module               | Description                                                                                                                                                                                |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Parser               | Splits a raw file into a list of `Section` objects, where each `Section` corresponds to a natural boundary in the source (PDF page, PPTX slide, Markdown heading block, whole image, etc.) |
| Chunker              | Cuts `Section`s into the final `Chunk`s to be indexed; never merges across `Section`s                                                                                                      |
| Embedding Model      | Embeds a `Chunk`'s text or multimodal content into a vector                                                                                                                                |
| Vector Store         | Connects to a vector database, stores `Chunk` vectors with metadata, and supports retrieval                                                                                                |
| KnowledgeBase handle | Binds together an embedding model, a vector store, and a collection, exposing `insert_document` / `search` / `list_documents` / `delete_document` as the one-stop entry point              |

This chapter focuses on **using RAG in non-service scenarios** — indexing files, retrieving knowledge, and integrating with an agent.

<Tip>
  For embedding models and how to configure them, see the [Embedding Model chapter](/versions/2.0.3/en/building-blocks/model); for the service version of RAG (with an HTTP service, file hosting, and distributed indexing), see [RAG Service](/versions/2.0.3/en/deploy/rag).
</Tip>

## Existing Implementations

AgentScope ships out-of-the-box default implementations for every module, all inheriting from base classes so users can easily swap them out:

### Parser

| Class              | Description                                                                                                                                                                                                               | Supported File Types                                                                                                                                          |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `TextParser`       | Text parser: the entire file is returned as a single `Section` and split downstream by a chunker                                                                                                                          | `text/plain`<br />`text/markdown`<br />`text/csv`<br />`text/html`<br />`text/x-rst`<br />`application/json`<br />`application/xml`<br />`application/x-yaml` |
| `PDFParser`        | PDF parser, **one `Section` per page**; the metadata carries a `page` field that starts at 1.                                                                                                                             | `application/pdf`                                                                                                                                             |
| `PPTParser`        | PowerPoint (`.pptx`) parser, walks slides in order:<br />- text/tables are merged into the same `Section`,<br />- images are read as standalone `DataBlock`s.<br />The metadata carries a `slide` field that starts at 1. | `application/vnd.openxmlformats-officedocument.presentationml.presentation`                                                                                   |
| `ImageParser`      | Image parser, reads the entire image as a single `Section`                                                                                                                                                                | `image/png`<br />`image/jpeg`<br />`image/gif`<br />`image/bmp`<br />`image/webp`                                                                             |
| In development ... |                                                                                                                                                                                                                           |                                                                                                                                                               |

<Tip>
  The PDF and PPT parsers depend on additional third-party libraries; install them in one shot with `pip install agentscope[rag]`.
</Tip>

### Chunker

| Class                | Description                                                                                                                                                                                |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `ApproxTokenChunker` | Splits text by approximate token count, without depending on any tokenizer.<br />Approximation strategy: `len(text.encode("utf-8")) // 4`; multimodal `DataBlock`s pass through unchanged. |
| In development ...   |                                                                                                                                                                                            |

### Embedding Model

See the [Embedding Model chapter](/versions/2.0.3/en/building-blocks/model).

### Vector Database

| Class              | Description                                                                                                                                                    |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `QdrantStore`      | Qdrant-based vector database implementation, supporting in-memory (`location=":memory:"`), local disk (`path=...`), and remote service (`url=...`) deployments |
| In development ... |                                                                                                                                                                |

## Using RAG

AgentScope recommends going through the **`KnowledgeBase` handle** as the entry point for RAG. It binds an embedding model, a vector store, a collection (and an optional `metadata_filter` for multi-tenant isolation) together and exposes only four operations:

| Method                                                              | Description                                                                                                                |
| ------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| `insert_document(chunks, document_id=None, document_metadata=None)` | Embeds and writes a batch of `Chunk`s as a single document; returns the `document_id`                                      |
| `search(queries, top_k=5, score_threshold=None)`                    | Runs vector retrieval over a list of queries (`str` / `TextBlock` / `DataBlock`), with automatic deduplication and sorting |
| `delete_document(document_id)`                                      | Removes every chunk of one document by its `document_id`                                                                   |
| `list_documents()`                                                  | Returns a list of `DocumentSummary` entries for every document in this knowledge base                                      |

### Indexing a File

Indexing a file goes through three steps — **file parsing → chunking → embedding + insertion** — one per module. The end-to-end flow:

<Steps>
  <Step title="Parse the file">
    Call the parser's `parse` method to read the raw file into a list of `Section`s, where each `Section` corresponds to a natural boundary in the source (PDF page / PPT slide / image …).

    The `file` parameter of `parse(file, filename)` accepts both **`bytes`** and **`str`**:

    * `bytes` is treated as the raw file content;
    * `str` in a binary parser (`PDFParser` / `PPTParser` / `ImageParser`) is a **filesystem path** that the parser reads from disk for you;
    * `str` in `TextParser` is disambiguated at runtime — if the string names an existing file it is read and decoded with the configured `encoding`; otherwise it is used verbatim as pre-decoded text.

    <CodeGroup>
      ```python Text file theme={null}
      from agentscope.rag import TextParser

      parser = TextParser()

      # 1) Pass bytes directly
      sections = await parser.parse(
          file=b"# Cats\nCats sleep 12-16 hours per day.\n",
          filename="cats.md",
      )

      # 2) Pass a file path (an existing file is read from disk)
      sections = await parser.parse(file="./cats.md", filename="cats.md")
      ```

      ```python PDF theme={null}
      from agentscope.rag import PDFParser

      parser = PDFParser()

      # 1) Pass a file path
      sections = await parser.parse(file="./report.pdf", filename="report.pdf")

      # 2) Or pass bytes directly (e.g. from an HTTP upload / blob store)
      with open("report.pdf", "rb") as f:
          sections = await parser.parse(file=f.read(), filename="report.pdf")

      # Each section.metadata contains {"page": N}
      ```

      ```python PowerPoint theme={null}
      from agentscope.rag import PPTParser

      parser = PPTParser(
          include_image=True,          # Whether to extract embedded images
          separate_table=False,        # Whether to emit tables as their own Section
          table_format="markdown",     # Table rendering: "markdown" or "json"
          slide_prefix="<slide index={index}>",
          slide_suffix="</slide>",
      )

      # File path
      sections = await parser.parse(file="./deck.pptx", filename="deck.pptx")
      ```

      ```python Image theme={null}
      from agentscope.rag import ImageParser

      parser = ImageParser()

      # File path
      sections = await parser.parse(file="./cat.png", filename="cat.png")
      # The entire image is wrapped as a single DataBlock; metadata records the media_type
      ```
    </CodeGroup>
  </Step>

  <Step title="Split into Chunks">
    Call the chunker's `chunk` method to turn the `Section` list into the final `Chunk` list to be indexed. Conventions: never merge across `Section`s; multimodal `DataBlock`s pass through as whole chunks; `chunk_index` runs consecutively from 0; every chunk carries the same `total_chunks`.

    ```python theme={null}
    from agentscope.rag import ApproxTokenChunker

    chunker = ApproxTokenChunker(chunk_size=256, overlap=32)
    chunks = await chunker.chunk(sections)
    ```
  </Step>

  <Step title="Write to the knowledge base">
    Construct a `KnowledgeBase` handle and write the chunk list — embedding and storage are taken care of by the handle. All chunks of the same document share one `document_id`, which makes whole-document deletion easy.

    ```python theme={null}
    from agentscope.credential import DashScopeCredential
    from agentscope.embedding import DashScopeEmbeddingModel
    from agentscope.rag import KnowledgeBase, QdrantStore

    embedding_model = DashScopeEmbeddingModel(
        credential=DashScopeCredential(api_key="YOUR_API_KEY"),
        model="text-embedding-v4",
        dimensions=1024,
    )

    # QdrantStore is an async context manager; entering it opens the client connection
    store = QdrantStore(location=":memory:")  # or url="http://..." for a real cluster

    async with store:
        knowledge = KnowledgeBase(
            name="demo-kb",
            description="A toy corpus.",
            embedding_model=embedding_model,
            vector_store=store,
            collection="demo-kb",
        )
        # The backing collection is created on first use, sized to embedding_model.dimensions
        document_id = await knowledge.insert_document(
            chunks,
            document_metadata={"filename": "cats.md"},
        )
    ```

    <Note>
      `KnowledgeBase` does not open or close the vector store connection itself; enter the `VectorStoreBase` instance in an `async with` block before using it.
    </Note>
  </Step>
</Steps>

### Vector Retrieval

Call `KnowledgeBase.search` directly with a list of query strings / `TextBlock`s / `DataBlock`s — no manual embedding required:

```python theme={null}
async with store:
    results = await knowledge.search(
        queries=["When do cats sleep?"],
        top_k=3,
        score_threshold=None,  # only meaningful for cosine / dot-product
    )
    for r in results:
        print(r.score, r.document_id, r.chunk.content)
```

`search` does the following internally:

1. **Drops unusable queries**: when the bound embedding model's `supports_multimodal == False`, `DataBlock` queries are silently dropped.
2. **Batched embedding**: every query is embedded in a single batch, then the collection is searched concurrently.
3. **Deduplication**: hits are deduplicated by `(document_id, chunk_index)` keeping the highest score.
4. **Truncation**: results are sorted by descending score and truncated to `top_k`.

The return value is a list of `VectorSearchResult`s; each entry carries `score`, `document_id`, and the matched `chunk`.

### Document Management

`KnowledgeBase` exposes two document-level helpers:

```python theme={null}
# List every document (one DocumentSummary per document)
summaries = await knowledge.list_documents()
for s in summaries:
    print(s.document_id, s.source, s.chunk_count, s.metadata)

# Delete every chunk belonging to one document
await knowledge.delete_document(document_id)
```

`DocumentSummary` carries the `document_id`, the original filename `source`, `chunk_count`, and the `metadata` recorded on the first chunk by the parser / uploader.

### Multi-tenant Isolation: `metadata_filter`

When multiple logical knowledge bases need to share one physical collection, pass a `metadata_filter` when constructing the `KnowledgeBase` (a typical pattern is stamping every record with `{"tenant_id": "..."}`):

```python theme={null}
knowledge = KnowledgeBase(
    name="tenant-a-kb",
    description="...",
    embedding_model=embedding_model,
    vector_store=store,
    collection="shared",
    metadata_filter={"tenant_id": "tenant-a"},
)
```

`metadata_filter` is a **defense-in-depth** mechanism:

* `search` and `list_documents` restrict records to those matching every `key == value` pair — nothing ever escapes the scope.
* `insert_document` **forces** the same metadata fields onto every chunk, so even a buggy or malicious parser cannot rebind a record into another scope.

`None` (the default) disables filtering — appropriate for deployments where every knowledge base owns its own collection outright.

### Multimodal Support

AgentScope's RAG natively supports the ingestion and retrieval of multimodal data — the key is matching the parser's and the embedding model's capabilities: the former must be able to parse multimodal files into `DataBlock`s, the latter must be able to embed `DataBlock`s directly.

* **Check which file types a Parser supports**: every `ParserBase` subclass declares its capability via the class attribute `supported_media_types` (a list of IANA media types), which you can read directly or auto-complete in your IDE.

  ```python theme={null}
  >>> from agentscope.rag import TextParser, ImageParser
  >>> TextParser.supported_media_types
  ['text/plain', 'text/markdown', 'text/csv', 'text/html', 'text/x-rst',
   'application/json', 'application/xml', 'application/x-yaml']
  >>> ImageParser.supported_media_types
  ['image/png', 'image/jpeg', 'image/gif', 'image/bmp', 'image/webp']
  ```

* **Check which modalities an embedding model supports**: the instance attribute `embedding_model.supports_multimodal` tells whether the model can directly handle `DataBlock`s (images / video / audio).

  ```python theme={null}
  >>> embedding_model.supports_multimodal
  True
  ```

When the parser yields `Chunk`s containing multimodal content and `embedding_model.supports_multimodal == True`, the ingestion and retrieval pipelines work without any extra configuration. Text-only models silently drop `DataBlock` queries inside `KnowledgeBase.search` instead of raising.

### Integrating with an Agent

`RAGMiddleware` plugs retrieval into the `Agent` class's reasoning-acting loop. The middleware does not own the embedding model or the vector store — it consumes **a list of pre-built `KnowledgeBase` handles**, which may mix knowledge bases that use different embedding models.

`RAGMiddleware` supports two working modes (`RAGMiddleware.Parameters.mode`), which can be used individually or **combined** (by attaching two instances with different `mode`s):

| Mode                  | Trigger                                                                         | Retrieval Query                                                      | Injection                                                                                    |
| --------------------- | ------------------------------------------------------------------------------- | -------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `"static"`            | Before the **first reasoning step** of each reply (`agent.state.cur_iter == 0`) | The input message of the reply method is used as the retrieval query | Retrieval results are wrapped into a `HintBlock` and injected into the context               |
| `"agentic"` (default) | The model invokes the retrieval tool on its own                                 | Decided by the model itself                                          | Exposes a `search_knowledge` tool — the agent decides when to retrieve and what query to use |

All parameters are wrapped in the nested `RAGMiddleware.Parameters` model:

| Field             | Default     | Description                                                                                                                                                 |
| ----------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `mode`            | `"agentic"` | Integration mode, see above                                                                                                                                 |
| `top_k`           | `5`         | Maximum number of hits returned in one search, deduplicated across knowledge bases and query inputs before truncation                                       |
| `score_threshold` | `None`      | Minimum similarity threshold; only meaningful under cosine / dot-product                                                                                    |
| `emit_hint_event` | `True`      | In `static` mode, whether to additionally emit a `HintBlockEvent` so the frontend can display the matched snippets                                          |
| `persist_hint`    | `False`     | In `static` mode, whether the injected block stays persistently in the context (it is removed after reasoning by default, to avoid polluting the next turn) |

In addition, in `agentic` mode `RAGMiddleware.list_tools()` returns a single `search_knowledge` tool — you must manually register it in the agent's `Toolkit` so the model can call it. The tool's description automatically lists the `name` / `description` of every attached knowledge base; the model can also restrict a search to a subset via the `knowledge_bases=[...]` argument.

Configure RAG on an agent instance with the following code:

<CodeGroup>
  ```python static mode theme={null}
  from agentscope.middleware import RAGMiddleware
  from agentscope.tool import Toolkit

  static_mw = RAGMiddleware(
      knowledges=[knowledge],            # One or more KnowledgeBase handles
      parameters=RAGMiddleware.Parameters(
          mode="static",
          top_k=3,
          emit_hint_event=False,
      ),
  )

  agent = Agent(
      name="static-agent",
      system_prompt="Answer the user's question using the retrieved material.",
      model=chat_model,
      toolkit=Toolkit(),
      middlewares=[static_mw],
  )
  ```

  ```python agentic mode theme={null}
  from agentscope.middleware import RAGMiddleware
  from agentscope.tool import Toolkit

  agentic_mw = RAGMiddleware(
      knowledges=[knowledge],
      parameters=RAGMiddleware.Parameters(mode="agentic", top_k=3),
  )

  # Note: in agentic mode you must manually inject the search_knowledge tool into the Toolkit
  toolkit = Toolkit(tools=await agentic_mw.list_tools())

  agent = Agent(
      name="agentic-agent",
      system_prompt="When necessary, call the search_knowledge tool to look up material.",
      model=chat_model,
      toolkit=toolkit,
      middlewares=[agentic_mw],
  )
  ```

  ```python combined modes theme={null}
  from agentscope.middleware import RAGMiddleware
  from agentscope.tool import Toolkit

  # static gives the first turn some background automatically;
  # agentic lets the model fetch more on demand
  static_mw = RAGMiddleware(
      knowledges=[knowledge],
      parameters=RAGMiddleware.Parameters(mode="static", top_k=3),
  )
  agentic_mw = RAGMiddleware(
      knowledges=[knowledge],
      parameters=RAGMiddleware.Parameters(mode="agentic", top_k=3),
  )

  # Note: in agentic mode you must manually inject the search_knowledge tool into the Toolkit
  toolkit = Toolkit(tools=await agentic_mw.list_tools())

  agent = Agent(
      name="hybrid-agent",
      system_prompt="Answer using the retrieved material; call search_knowledge for more when needed.",
      model=chat_model,
      toolkit=toolkit,
      middlewares=[static_mw, agentic_mw],
  )
  ```
</CodeGroup>

## Custom Extensions

All RAG modules use base-class inheritance, so users can customize Parser, Chunker, Embedding Model, and Vector Store — inherit from the corresponding base class, implement its core methods, and the custom class slots seamlessly into the pipeline above.

<Tip>
  Contributions of new Parsers, Chunkers, and Vector Stores to the official AgentScope repository are welcome!
</Tip>

### Custom Parser

Inherit from `ParserBase`, declare the IANA media types you can handle in the class attribute `supported_media_types`, and implement `async def parse(file, filename)` to split a byte stream into a list of `Section`s:

```python theme={null}
from agentscope.message import TextBlock
from agentscope.rag import ParserBase, Section


class MyMarkdownParser(ParserBase):
    supported_media_types = ["text/markdown"]

    async def parse(
        self,
        file: bytes | str,
        filename: str,
    ) -> list[Section]:
        text = file.decode("utf-8") if isinstance(file, bytes) else file
        # Split by H2 headings into multiple Sections, preserving the source
        return [
            Section(
                content=TextBlock(text=block),
                source=filename,
                metadata={"index": index},
            )
            for index, block in enumerate(text.split("\n## "))
        ]
```

You may also override `supported_extensions()` if needed (the default reverse-lookup from `supported_media_types` produces noisy developer extensions; override explicitly when you want the front-end file picker to show only a curated set).

### Custom Chunker

Inherit from `ChunkerBase` and implement `async def chunk(sections)` to turn a list of `Section`s into the `Chunk`s to be indexed. Conventions: never merge across `Section`s; multimodal `DataBlock`s pass through as whole chunks; `chunk_index` runs consecutively from 0 across the result list; `total_chunks` stays consistent on every chunk:

```python theme={null}
from agentscope.message import TextBlock
from agentscope.rag import Chunk, ChunkerBase, Section


class FixedCharChunker(ChunkerBase):
    def __init__(self, chunk_size: int = 1000) -> None:
        self._chunk_size = chunk_size

    async def chunk(self, sections: list[Section]) -> list[Chunk]:
        chunks: list[Chunk] = []
        for section in sections:
            # Multimodal content is not split — pass through as a whole chunk
            if not isinstance(section.content, TextBlock):
                chunks.append(
                    Chunk(
                        content=section.content,
                        source=section.source,
                        chunk_index=0,
                        total_chunks=0,
                        metadata=dict(section.metadata),
                    ),
                )
                continue
            text = section.content.text
            for start in range(0, len(text), self._chunk_size):
                chunks.append(
                    Chunk(
                        content=TextBlock(
                            text=text[start : start + self._chunk_size],
                        ),
                        source=section.source,
                        chunk_index=0,
                        total_chunks=0,
                        metadata=dict(section.metadata),
                    ),
                )
        # Renumber consistently
        for index, chunk in enumerate(chunks):
            chunk.chunk_index = index
            chunk.total_chunks = len(chunks)
        return chunks
```

### Custom Vector Database

Inherit from `VectorStoreBase`, implement `create_collection` / `delete_collection` / `has_collection` / `insert` / `delete` / `search` / `list_documents`, and manage the underlying connection lifecycle through `__aenter__` / `__aexit__`:

```python theme={null}
from typing import Any
from agentscope.rag import (
    DocumentSummary,
    VectorRecord,
    VectorSearchResult,
    VectorStoreBase,
)


class MyVectorStore(VectorStoreBase):
    async def __aenter__(self) -> "MyVectorStore":
        self._client = await connect_my_backend(...)
        return self

    async def __aexit__(self, exc_type, exc, tb) -> None:
        await self._client.close()

    async def create_collection(self, name: str, dimensions: int) -> None: ...

    async def delete_collection(self, name: str) -> None: ...

    async def has_collection(self, name: str) -> bool: ...

    async def insert(
        self,
        collection: str,
        records: list[VectorRecord],
    ) -> None: ...

    async def delete(self, collection: str, document_id: str) -> None: ...

    async def search(
        self,
        collection: str,
        query_vector: list[float],
        top_k: int = 5,
        metadata_filter: dict[str, Any] | None = None,
    ) -> list[VectorSearchResult]: ...

    async def list_documents(
        self,
        collection: str,
        metadata_filter: dict[str, Any] | None = None,
    ) -> list[DocumentSummary]: ...
```

Implementation notes:

* `delete` removes **every** record belonging to a `document_id`; callers add and remove documents as a unit.
* `search` and `list_documents` must translate `metadata_filter` into a backend-native payload filter so multi-tenant isolation works.
* `insert` must persist both `VectorRecord.document_id` and the `chunk` — otherwise `delete` and `list_documents` cannot work.

## Further Reading

<CardGroup cols={1}>
  <Card title="RAG Service" icon="server" href="/versions/2.0.3/en/deploy/rag" cta="View deployment docs" arrow>
    A multi-tenant, distributed RAG service with HTTP API, file hosting, and managed vector databases.
  </Card>

  <Card title="Middleware" icon="layer-group" href="/versions/2.0.3/en/building-blocks/middleware" cta="Learn the middleware mechanism" arrow>
    See how `RAGMiddleware` plugs into the reply / reasoning hooks.
  </Card>

  <Card title="Embedding Model" icon="cube" href="/versions/2.0.3/en/building-blocks/model" cta="View model list" arrow>
    Available embedding models and their parameters.
  </Card>
</CardGroup>