| Module | Description |
|---|---|
| Parser | Splits a raw file into a list of Section objects, where each Section corresponds to a natural boundary in the source (PDF page, PPTX slide, Markdown heading block, whole image, etc.) |
| Chunker | Cuts Sections into the final Chunks to be indexed; never merges across Sections |
| Embedding Model | Embeds a Chunk’s text or multimodal content into a vector |
| Vector Store | Connects to a vector database, stores Chunk vectors with metadata, and supports retrieval |
| KnowledgeBase handle | Binds together an embedding model, a vector store, and a collection, exposing insert_document / search / list_documents / delete_document as the one-stop entry point |
Existing Implementations
AgentScope ships out-of-the-box default implementations for every module, all inheriting from base classes so users can easily swap them out:Parser
| Class | Description | Supported File Types |
|---|---|---|
TextParser | Text parser: the entire file is returned as a single Section and split downstream by a chunker | text/plaintext/markdowntext/csvtext/htmltext/x-rstapplication/jsonapplication/xmlapplication/x-yaml |
PDFParser | PDF parser, one Section per page; the metadata carries a page field that starts at 1. | application/pdf |
PPTParser | PowerPoint (.pptx) parser, walks slides in order:- text/tables are merged into the same Section,- images are read as standalone DataBlocks.The metadata carries a slide field that starts at 1. | application/vnd.openxmlformats-officedocument.presentationml.presentation |
ImageParser | Image parser, reads the entire image as a single Section | image/pngimage/jpegimage/gifimage/bmpimage/webp |
| In development … |
Chunker
| Class | Description |
|---|---|
ApproxTokenChunker | Splits text by approximate token count, without depending on any tokenizer. Approximation strategy: len(text.encode("utf-8")) // 4; multimodal DataBlocks pass through unchanged. |
| In development … |
Embedding Model
See the Embedding Model chapter.Vector Database
| Class | Description |
|---|---|
QdrantStore | Qdrant-based vector database implementation, supporting in-memory (location=":memory:"), local disk (path=...), and remote service (url=...) deployments |
| In development … |
Using RAG
AgentScope recommends going through theKnowledgeBase handle as the entry point for RAG. It binds an embedding model, a vector store, a collection (and an optional metadata_filter for multi-tenant isolation) together and exposes only four operations:
| Method | Description |
|---|---|
insert_document(chunks, document_id=None, document_metadata=None) | Embeds and writes a batch of Chunks as a single document; returns the document_id |
search(queries, top_k=5, score_threshold=None) | Runs vector retrieval over a list of queries (str / TextBlock / DataBlock), with automatic deduplication and sorting |
delete_document(document_id) | Removes every chunk of one document by its document_id |
list_documents() | Returns a list of DocumentSummary entries for every document in this knowledge base |
Indexing a File
Indexing a file goes through three steps — file parsing → chunking → embedding + insertion — one per module. The end-to-end flow:Parse the file
Call the parser’s
parse method to read the raw file into a list of Sections, where each Section corresponds to a natural boundary in the source (PDF page / PPT slide / image …).The file parameter of parse(file, filename) accepts both bytes and str:bytesis treated as the raw file content;strin a binary parser (PDFParser/PPTParser/ImageParser) is a filesystem path that the parser reads from disk for you;strinTextParseris disambiguated at runtime — if the string names an existing file it is read and decoded with the configuredencoding; otherwise it is used verbatim as pre-decoded text.
Split into Chunks
Call the chunker’s
chunk method to turn the Section list into the final Chunk list to be indexed. Conventions: never merge across Sections; multimodal DataBlocks pass through as whole chunks; chunk_index runs consecutively from 0; every chunk carries the same total_chunks.Write to the knowledge base
Construct a
KnowledgeBase handle and write the chunk list — embedding and storage are taken care of by the handle. All chunks of the same document share one document_id, which makes whole-document deletion easy.KnowledgeBase does not open or close the vector store connection itself; enter the VectorStoreBase instance in an async with block before using it.Vector Retrieval
CallKnowledgeBase.search directly with a list of query strings / TextBlocks / DataBlocks — no manual embedding required:
search does the following internally:
- Drops unusable queries: when the bound embedding model’s
supports_multimodal == False,DataBlockqueries are silently dropped. - Batched embedding: every query is embedded in a single batch, then the collection is searched concurrently.
- Deduplication: hits are deduplicated by
(document_id, chunk_index)keeping the highest score. - Truncation: results are sorted by descending score and truncated to
top_k.
VectorSearchResults; each entry carries score, document_id, and the matched chunk.
Document Management
KnowledgeBase exposes two document-level helpers:
DocumentSummary carries the document_id, the original filename source, chunk_count, and the metadata recorded on the first chunk by the parser / uploader.
Multi-tenant Isolation: metadata_filter
When multiple logical knowledge bases need to share one physical collection, pass a metadata_filter when constructing the KnowledgeBase (a typical pattern is stamping every record with {"tenant_id": "..."}):
metadata_filter is a defense-in-depth mechanism:
searchandlist_documentsrestrict records to those matching everykey == valuepair — nothing ever escapes the scope.insert_documentforces the same metadata fields onto every chunk, so even a buggy or malicious parser cannot rebind a record into another scope.
None (the default) disables filtering — appropriate for deployments where every knowledge base owns its own collection outright.
Multimodal Support
AgentScope’s RAG natively supports the ingestion and retrieval of multimodal data — the key is matching the parser’s and the embedding model’s capabilities: the former must be able to parse multimodal files intoDataBlocks, the latter must be able to embed DataBlocks directly.
-
Check which file types a Parser supports: every
ParserBasesubclass declares its capability via the class attributesupported_media_types(a list of IANA media types), which you can read directly or auto-complete in your IDE. -
Check which modalities an embedding model supports: the instance attribute
embedding_model.supports_multimodaltells whether the model can directly handleDataBlocks (images / video / audio).
Chunks containing multimodal content and embedding_model.supports_multimodal == True, the ingestion and retrieval pipelines work without any extra configuration. Text-only models silently drop DataBlock queries inside KnowledgeBase.search instead of raising.
Integrating with an Agent
RAGMiddleware plugs retrieval into the Agent class’s reasoning-acting loop. The middleware does not own the embedding model or the vector store — it consumes a list of pre-built KnowledgeBase handles, which may mix knowledge bases that use different embedding models.
RAGMiddleware supports two working modes (RAGMiddleware.Parameters.mode), which can be used individually or combined (by attaching two instances with different modes):
| Mode | Trigger | Retrieval Query | Injection |
|---|---|---|---|
"static" | Before the first reasoning step of each reply (agent.state.cur_iter == 0) | The input message of the reply method is used as the retrieval query | Retrieval results are wrapped into a HintBlock and injected into the context |
"agentic" (default) | The model invokes the retrieval tool on its own | Decided by the model itself | Exposes a search_knowledge tool — the agent decides when to retrieve and what query to use |
RAGMiddleware.Parameters model:
| Field | Default | Description |
|---|---|---|
mode | "agentic" | Integration mode, see above |
top_k | 5 | Maximum number of hits returned in one search, deduplicated across knowledge bases and query inputs before truncation |
score_threshold | None | Minimum similarity threshold; only meaningful under cosine / dot-product |
emit_hint_event | True | In static mode, whether to additionally emit a HintBlockEvent so the frontend can display the matched snippets |
persist_hint | False | In static mode, whether the injected block stays persistently in the context (it is removed after reasoning by default, to avoid polluting the next turn) |
agentic mode RAGMiddleware.list_tools() returns a single search_knowledge tool — you must manually register it in the agent’s Toolkit so the model can call it. The tool’s description automatically lists the name / description of every attached knowledge base; the model can also restrict a search to a subset via the knowledge_bases=[...] argument.
Configure RAG on an agent instance with the following code:
Custom Extensions
All RAG modules use base-class inheritance, so users can customize Parser, Chunker, Embedding Model, and Vector Store — inherit from the corresponding base class, implement its core methods, and the custom class slots seamlessly into the pipeline above.Custom Parser
Inherit fromParserBase, declare the IANA media types you can handle in the class attribute supported_media_types, and implement async def parse(file, filename) to split a byte stream into a list of Sections:
supported_extensions() if needed (the default reverse-lookup from supported_media_types produces noisy developer extensions; override explicitly when you want the front-end file picker to show only a curated set).
Custom Chunker
Inherit fromChunkerBase and implement async def chunk(sections) to turn a list of Sections into the Chunks to be indexed. Conventions: never merge across Sections; multimodal DataBlocks pass through as whole chunks; chunk_index runs consecutively from 0 across the result list; total_chunks stays consistent on every chunk:
Custom Vector Database
Inherit fromVectorStoreBase, implement create_collection / delete_collection / has_collection / insert / delete / search / list_documents, and manage the underlying connection lifecycle through __aenter__ / __aexit__:
deleteremoves every record belonging to adocument_id; callers add and remove documents as a unit.searchandlist_documentsmust translatemetadata_filterinto a backend-native payload filter so multi-tenant isolation works.insertmust persist bothVectorRecord.document_idand thechunk— otherwisedeleteandlist_documentscannot work.
Further Reading
RAG Service
A multi-tenant, distributed RAG service with HTTP API, file hosting, and managed vector databases.
Middleware
See how
RAGMiddleware plugs into the reply / reasoning hooks.Embedding Model
Available embedding models and their parameters.