Knowledge bases and RAG

What is RAG?

Retrieval-Augmented Generation (RAG) gives your agents access to specific documents and data that aren’t part of their pre-training. Instead of relying solely on the language model’s general knowledge, RAG agents retrieve relevant information from your documents and use it to generate informed, grounded responses.

How RAG works

Retrieve relevant chunks

When an agent receives a query, the system searches your knowledge base for document chunks semantically similar to the query

Augment the prompt

Retrieved chunks are injected into the agent’s prompt as additional context

Generate informed response

The language model generates a response based on both its general knowledge AND the specific retrieved content

The key insight: LLMs are excellent at reasoning and language generation, but they don’t know YOUR data. RAG bridges this gap by retrieving your data at query time and providing it as context.

Why use RAG?

Knowledge cutoffs

LLMs have training cutoffs and don’t know information published after that date. RAG gives agents access to current information.

Internal data

Your organization’s policies, procedures, and documentation aren’t in the LLM’s training data. RAG makes this information accessible.

Grounded responses

RAG grounds agent outputs in specific source documents, reducing hallucinations and enabling citation of sources.

Dynamic knowledge

Update your knowledge base and agents immediately have access to new information — no model retraining required.

When to use RAG

RAG is ideal for:

HR policy assistants — Answer employee questions based on specific policy documents
Compliance reviewers — Verify that proposals comply with internal guidelines and regulatory requirements
Technical documentation Q&A — Help users find information in product documentation
Customer support — Answer questions based on knowledge bases and help articles
Contract analysis — Review contracts against your organization’s standard terms and conditions
Research assistants — Query large document collections to find relevant information

If your agent needs to know information specific to your organization, domain, or use case, you need RAG.

Qdrant vector database

MagOneAI uses Qdrant as its vector database for semantic search and retrieval. Qdrant is purpose-built for similarity search over high-dimensional vectors.

How Qdrant powers RAG

Document ingestion

When you upload documents to a knowledge base, they’re processed and split into chunks

Embedding generation

Each chunk is converted to a high-dimensional vector using an embedding model

Vector storage

Vectors are stored in Qdrant with metadata (source document, chunk position, etc.)

Query-time retrieval

When an agent queries the knowledge base, the query is embedded and Qdrant performs similarity search

Ranking and return

Qdrant returns the top-k most similar chunks based on vector distance

Why Qdrant?

Performance — Fast similarity search even over millions of vectors
Scalability — Handles large knowledge bases with consistent query latency
Filtering — Combine vector similarity with metadata filtering (e.g., “search only in contract documents”)
Hybrid search — Blend semantic similarity with keyword matching for better retrieval quality

MagOneAI manages the Qdrant infrastructure for you. You simply upload documents and configure knowledge bases — the vector database operations happen automatically.

Creating and managing knowledge bases

Knowledge bases are collections of documents that agents can query. Each knowledge base has its own Qdrant collection and can be attached to multiple agents.

Create a knowledge base in your project

Navigate to the Knowledge Bases section and click “Create Knowledge Base”

Name — Descriptive name like “HR Policies” or “Product Documentation”
Description — What documents this knowledge base contains
Embedding model — Select the embedding model (default: text-embedding-3-small)
Chunk size — How large each document chunk should be (default: 512 tokens)
Chunk overlap — How much chunks should overlap (default: 50 tokens)

Upload documents

Upload documents to the knowledge base using drag-and-drop or file selectionSupported formats:

PDF (.pdf)
Microsoft Word (.docx, .doc)
Plain text (.txt)
Markdown (.md)
CSV (.csv)
Rich Text Format (.rtf)

You can upload multiple files simultaneously. Each file is processed asynchronously.

Documents are automatically chunked and embedded

MagOneAI processes your documents in the background:

Text is extracted from each document
Content is split into chunks based on your configured chunk size
Each chunk is embedded using the selected embedding model
Vectors are stored in Qdrant with source metadata

You can monitor processing status in the knowledge base detail view.

Attach the knowledge base to an agent

In the agent configuration, add the knowledge base under “Knowledge Bases”You can attach multiple knowledge bases to a single agent. The agent will search across all attached knowledge bases when retrieving context.

The agent now retrieves relevant context when answering questions

When the agent executes in a workflow, it automatically queries attached knowledge bases based on the input, retrieves relevant chunks, and generates responses grounded in your documents.

Managing knowledge bases

Add documents
Update documents
Delete documents
Test retrieval
View statistics

Upload new documents at any time. They’re automatically processed and become immediately available for retrieval.

RAG agents are ideal for HR policy assistants, compliance reviewers, technical documentation Q&A, and any use case where the agent needs to answer from your specific documents.

How RAG works in agent execution

When a RAG agent executes within a workflow, the retrieval and generation process follows a precise sequence:

Detailed execution flow

Agent receives query/input

The agent receives input from the workflow — typically a question or task that requires knowledge base consultationExample input:

{
  "question": "What is our policy on remote work for new employees?",
  "context": "employee_onboarding"
}

Query is embedded

The input question is converted to a vector using the same embedding model used for the knowledge baseThis ensures queries and documents exist in the same vector space for accurate similarity search

Similarity search against knowledge base vectors

Qdrant performs similarity search to find document chunks most relevant to the query

Default: retrieves top-5 chunks
Configurable: you can adjust the number of chunks (k) based on context window size and retrieval quality needs
Metadata filtering: optionally filter by document type, date, or custom metadata

Top-k relevant chunks are retrieved

The most similar chunks are returned with:

Chunk text — The actual document content
Source metadata — Which document the chunk came from
Similarity score — How relevant the chunk is (0-1)
Position metadata — Where in the document this chunk appeared

Chunks are injected into the agent's prompt as context

Retrieved chunks are formatted and added to the agent’s system prompt:

You are an HR policy assistant. Answer the user's question based on the
following policy documents:

---
SOURCE: Remote Work Policy 2024.pdf

[Chunk text about remote work eligibility...]

---
SOURCE: Employee Handbook.pdf

[Chunk text about onboarding procedures...]

---

User question: What is our policy on remote work for new employees?

Answer based on the provided policy documents. If the answer isn't in
the documents, say so clearly.

Agent generates response grounded in retrieved documents

The LLM generates a response using:

Its general language understanding and reasoning capabilities
The specific content from retrieved chunks
The agent’s persona and instructions

The response is grounded in your documents rather than the model’s general training data.

Retrieval configuration

You can configure retrieval behavior per agent:

agent:
  name: "HR Policy Assistant"
  knowledge_bases:
    - hr_policies
    - employee_handbook
  retrieval_config:
    top_k: 5  # Number of chunks to retrieve
    min_similarity: 0.7  # Minimum similarity score threshold
    reranking: true  # Use reranking model for better ordering
    metadata_filters:
      document_type: "policy"  # Only retrieve from policy docs

Citation and source tracking

RAG agents can cite their sources, providing transparency and auditability:

{
  "answer": "New employees are eligible for remote work after completing 90 days of employment and receiving manager approval.",
  "sources": [
    {
      "document": "Remote Work Policy 2024.pdf",
      "chunk_id": "chunk_42",
      "similarity": 0.89,
      "excerpt": "...employees must complete the initial 90-day probationary period..."
    }
  ],
  "confidence": 0.92
}

This structured output allows downstream workflow nodes to access both the answer AND the supporting evidence.

Chunking and embedding

The quality of your RAG system depends heavily on how documents are chunked and embedded.

Document chunking strategies

Chunking is the process of splitting documents into smaller segments for embedding and retrieval.

Fixed-size chunking
Semantic chunking
Hierarchical chunking

Split documents into chunks of fixed token lengthConfiguration:

chunk_size: 512  # tokens per chunk
chunk_overlap: 50  # tokens of overlap between chunks

Advantages:

Predictable chunk sizes fit LLM context windows
Simple and fast
Works well for uniform documents

Disadvantages:

May split logical sections (paragraphs, lists) arbitrarily
Doesn’t respect document structure

Split documents at natural boundaries (paragraphs, sections, sentences)Configuration:

chunking_strategy: semantic
max_chunk_size: 512
respect_boundaries: [paragraph, section, sentence]

Advantages:

Preserves logical document structure
Chunks are semantically coherent units
Better retrieval quality for structured documents

Disadvantages:

Variable chunk sizes may not fit context windows
More complex processing

Create chunks at multiple granularities (document → section → paragraph)Configuration:

chunking_strategy: hierarchical
levels:
  - document
  - section
  - paragraph

Advantages:

Retrieve at appropriate granularity for the query
Preserves document hierarchy
Can retrieve broad context or specific details

Disadvantages:

More complex retrieval logic
Higher storage requirements

Chunk overlap

Overlap ensures important information at chunk boundaries isn’t lost: Without overlap:

Chunk 1: [...employee must complete 90 days]
Chunk 2: [of employment before remote work eligibility...]

The connection between “90 days” and “remote work eligibility” is split across chunks. With overlap:

Chunk 1: [...employee must complete 90 days of employment before]
Chunk 2: [complete 90 days of employment before remote work eligibility...]

Now both chunks contain the complete concept. Recommended overlap: 10-20% of chunk size (e.g., 50-100 tokens for 512-token chunks)

Embedding models

Embedding models convert text to high-dimensional vectors. MagOneAI supports several options:

Model	Dimensions	Strengths	Use Cases
text-embedding-3-small	1536	Fast, cost-effective, good general performance	Default for most use cases
text-embedding-3-large	3072	Higher quality embeddings, better semantic understanding	Complex domain-specific knowledge bases
text-embedding-ada-002	1536	Proven performance, widely used	Backward compatibility

Model selection considerations:

Performance vs. cost — Larger models produce better embeddings but cost more per token
Domain specificity — Domain-specific models (legal, medical) may outperform general models for specialized content
Consistency — Use the same embedding model for all documents in a knowledge base

How chunk size affects retrieval quality

Chunk size is a critical parameter that impacts both retrieval quality and LLM reasoning:

Small chunks (128-256 tokens)

Advantages:

Precise retrieval of specific facts
Less noise in retrieved context
More chunks fit in LLM context window

Disadvantages:

May lack surrounding context needed for understanding
More chunks required to answer complex questions
Higher retrieval overhead

Best for: FAQ-style Q&A, fact lookup, specific information retrieval

Medium chunks (512-1024 tokens)

Advantages:

Balance between precision and context
Chunks are semantically self-contained
Good for most use cases

Disadvantages:

May retrieve some irrelevant content along with relevant information
Fewer chunks fit in context window

Best for: General knowledge base Q&A, policy consultation, documentation searchRecommended default: 512 tokens with 50-token overlap

Large chunks (1024-2048 tokens)

Advantages:

Retrieves broad context around relevant information
Good for understanding complex relationships
Fewer retrieval calls needed

Disadvantages:

More noise in retrieved context
Fewer chunks fit in LLM context window
May retrieve content that’s partially relevant

Best for: Summarization tasks, complex analytical questions, research use cases

Best practices

Keep documents focused and well-structured

Good document structure:

Clear headings and sections
Logical information hierarchy
Consistent formatting
One topic per document or section

Poor document structure:

Wall-of-text with no sections
Mixed topics in single document
Inconsistent formatting
Unclear boundaries between concepts

Well-structured documents chunk better and retrieve more accurately.

Use descriptive file names

File names appear in source citations and help agents understand context: Good file names:

remote_work_policy_2024.pdf
employee_onboarding_checklist.pdf
gdpr_compliance_guidelines.pdf

Poor file names:

document_final_v3.pdf
policy.pdf
untitled.pdf

Regularly update knowledge bases when source documents change

RAG systems reflect the documents in the knowledge base. When source documents change:

Identify updated documents

Track which source documents have been revised

Upload new versions

Replace old documents with updated versions

Verify processing

Ensure documents are re-chunked and re-embedded

Test retrieval

Query the knowledge base to verify updated information is retrieved

Stale knowledge bases lead to agents providing outdated information.

Test retrieval quality before production deployment

Before deploying RAG agents to production:

Create test query set

Build a set of representative questions your agents will receive

Evaluate retrieval

For each test query, examine retrieved chunks:

Are the most relevant chunks retrieved?
Is irrelevant content being retrieved?
Are there gaps in coverage?

Measure metrics

Track retrieval metrics:

Precision — What percentage of retrieved chunks are relevant?
Recall — What percentage of relevant chunks are retrieved?
MRR (Mean Reciprocal Rank) — How highly ranked is the first relevant chunk?

Iterate on configuration

Adjust chunk size, overlap, top-k, and embedding models based on metrics

Test end-to-end agent performance

Evaluate not just retrieval, but agent answer quality using the retrieved context

Optimize chunk size for your use case

No single chunk size is optimal for all use cases. Run experiments:

Create knowledge bases with different chunk sizes (256, 512, 1024 tokens)
Test agent performance on representative queries
Measure answer quality and relevance
Select the configuration that performs best for YOUR data and use case

Use metadata filtering for multi-domain knowledge bases

If your knowledge base contains different document types, use metadata filtering:

retrieval_config:
  metadata_filters:
    document_type: "technical_documentation"
    version: "latest"
    department: "engineering"

This prevents agents from retrieving irrelevant documents even if they’re semantically similar.

Monitor retrieval performance in production

Track retrieval quality metrics in production:

Average similarity scores — Declining scores may indicate knowledge base staleness
Retrieval latency — Monitor query performance as knowledge base grows
Zero-result queries — Track queries that retrieve no relevant chunks
Agent confidence scores — Low confidence may indicate poor retrieval quality

Use this telemetry to continuously improve your knowledge bases.

RAG is not a replacement for fine-tuning or prompt engineering. It’s a complementary technique that gives agents access to specific information. You still need well-crafted personas, clear instructions, and appropriate guardrails for production-quality agents.

Advanced RAG techniques

Once you’ve mastered basic RAG, consider these advanced techniques:

Hybrid search

Combine semantic similarity (vector search) with keyword matching (BM25):

Vector search finds semantically similar content
Keyword search finds exact term matches
Results are merged and re-ranked

Hybrid search improves retrieval when users include specific technical terms or entity names.

Reranking

After initial retrieval, use a reranking model to reorder chunks:

Initial retrieval

Qdrant returns top-20 chunks based on vector similarity

Reranking

A specialized reranking model scores each chunk’s relevance to the query

Selection

Top-5 reranked chunks are used as agent context

Reranking improves precision by using a more sophisticated relevance model.

Query expansion

Expand user queries before retrieval:

Generate variations of the query
Add synonyms and related terms
Retrieve using multiple query formulations
Combine and deduplicate results

Query expansion improves recall when users phrase questions differently than source documents.

Contextual compression

Compress retrieved chunks to include only relevant sentences:

Retrieve chunks

Get top-k chunks from Qdrant

Extract relevant sentences

Use a compression model to identify which sentences in each chunk are most relevant to the query

Compressed context

Provide only relevant sentences to the agent, reducing noise and fitting more context in the window

Compression increases the effective context window by removing irrelevant content.

Troubleshooting RAG issues

Agent doesn't retrieve relevant information

Possible causes:

Documents aren’t in the knowledge base
Chunk size is too small or too large
Query and documents use different terminology
Embedding model doesn’t capture domain semantics

Solutions:

Verify documents uploaded and processed
Experiment with different chunk sizes
Use query expansion or synonyms
Try a different embedding model

Agent retrieves irrelevant chunks

Possible causes:

Knowledge base contains too much diverse content
Chunk size is too large
No metadata filtering

Solutions:

Split knowledge bases by domain
Reduce chunk size
Add metadata filters to retrieval config
Use reranking to improve precision

Agent hallucinates despite having access to correct information

Possible causes:

Relevant chunks not retrieved (retrieval problem)
Relevant chunks retrieved but not used by LLM (generation problem)
Persona doesn’t emphasize grounding in documents

Solutions:

Test retrieval separately from generation
Update persona to emphasize: “Base your answer ONLY on the provided documents”
Add guardrails that require source citations
Increase top-k to provide more context

Retrieval is too slow

Possible causes:

Knowledge base is very large
Qdrant cluster under-provisioned
Embedding model is slow

Solutions:

Monitor Qdrant performance metrics
Consider Qdrant scaling options
Use a faster embedding model
Implement caching for common queries

Next steps

Personas and prompts

Craft prompts that effectively use retrieved context

Building workflows

Integrate RAG agents into Temporal workflows

Testing workflows

Test RAG quality and agent performance

Getting Started

Platform Guide

Agents

Workflow Builder

Tools & Integrations

Models & Providers

Cookbooks

Security & Administration

​What is RAG?

​How RAG works

​Why use RAG?

Knowledge cutoffs

Internal data

Grounded responses

Dynamic knowledge

​When to use RAG

​Qdrant vector database

​How Qdrant powers RAG

​Why Qdrant?

​Creating and managing knowledge bases

​Managing knowledge bases

​How RAG works in agent execution

​Detailed execution flow

​Retrieval configuration

​Citation and source tracking

​Chunking and embedding

​Document chunking strategies

​Chunk overlap

​Embedding models

​How chunk size affects retrieval quality

​Best practices

​Keep documents focused and well-structured

​Use descriptive file names

​Regularly update knowledge bases when source documents change

​Test retrieval quality before production deployment

​Optimize chunk size for your use case

​Use metadata filtering for multi-domain knowledge bases

​Monitor retrieval performance in production

​Advanced RAG techniques

​Hybrid search

​Reranking

​Query expansion

​Contextual compression

​Troubleshooting RAG issues

​Next steps

Personas and prompts

Building workflows

Testing workflows

What is RAG?

How RAG works

Why use RAG?

When to use RAG

Qdrant vector database

How Qdrant powers RAG

Why Qdrant?

Creating and managing knowledge bases

Managing knowledge bases

How RAG works in agent execution

Detailed execution flow

Retrieval configuration

Citation and source tracking

Chunking and embedding

Document chunking strategies

Chunk overlap

Embedding models

How chunk size affects retrieval quality

Best practices

Keep documents focused and well-structured

Use descriptive file names

Regularly update knowledge bases when source documents change

Test retrieval quality before production deployment

Optimize chunk size for your use case

Use metadata filtering for multi-domain knowledge bases

Monitor retrieval performance in production

Advanced RAG techniques

Hybrid search

Reranking

Query expansion

Contextual compression

Troubleshooting RAG issues

Next steps