AgentDyneAgentDyne
MarketplaceIntegrationsBuildDocsBlogPricing
Back to Blog
Engineering 9 min read April 7, 2026

RAG Without the Hallucinations: Building Grounded Agents

RAG lets your agents answer from facts, not imagination. We walk through chunking strategy, embedding model choice, and the pgvector queries powering AgentDyne knowledge bases.

PS

Priya Sharma

Head of Engineering, AgentDyne

Why Agents Hallucinate

Large language models are trained to produce fluent, plausible text. When asked a question outside their training data, they do not say "I don't know" — they generate a confident-sounding answer that might be completely fabricated.

RAG (Retrieval-Augmented Generation) solves this by injecting real facts into the model's context before it generates a response.

Chunking: The Critical Step Most Get Wrong

The quality of your RAG system is determined primarily by chunking strategy, not model choice.

Our benchmarks on support documentation:

Chunk size (chars)Retrieval precisionAnswer quality
20042%Poor
50071%Good
80078%Very Good
120073%Good
200061%Fair

The sweet spot is 500–900 characters with 100-character overlaps between chunks.

Embedding Model Choice

We use OpenAI text-embedding-3-small for all knowledge base embeddings.

At our scale:

•text-embedding-3-small: $0.02 / 1M tokens
•text-embedding-3-large: $0.13 / 1M tokens

For most RAG use cases, the precision improvement of 3-large does not justify 6.5x the cost. We validated this against a 5,000-question benchmark — 3-small achieves 94% of the answer quality at 15% of the cost.

The pgvector Query

Once chunks are embedded, retrieval is a single SQL query:

SELECT
  c.id,
  d.title AS document_title,
  c.content,
  (1 - (c.embedding <=> $1))::float AS similarity
FROM rag_chunks c
JOIN rag_documents d ON d.id = c.document_id
WHERE c.knowledge_base_id = $2
  AND (1 - (c.embedding <=> $1)) > 0.65
ORDER BY c.embedding <=> $1
LIMIT 5;

The threshold of 0.65 (65% cosine similarity) filters out semantically unrelated chunks. We use an IVFFlat index (lists = 100) for ~10x faster search.

Context Injection

Retrieved chunks are injected into the agent's system prompt in a structured block. The citation instruction is critical — without it, models paraphrase context without indicating which source they used.

Evaluating Your RAG System

Before going to production, run these three checks:

1.Retrieval recall: For 50 hand-picked questions, does the correct chunk appear in the top 5? Target: >85%.
2.Answer faithfulness: Are claims in the answer supported by retrieved context? Target: >90%.
3.Out-of-scope detection: For questions your KB cannot answer, does the agent correctly say it doesn't know? Target: >80%.

More in Engineering

Engineering11 min

Multi-Agent Pipelines in Production: Lessons from 10,000 Runs

March 31, 2026

Engineering10 min

Cloudflare Edge vs Vercel: What We Learned Running AI at the Edge

March 18, 2026

All articles
AgentDyne

Build once. Sell everywhere. The execution-grade marketplace where AI microagents go to production.

Product

  • Marketplace
  • Integrations
  • Builder Studio
  • Pricing
  • Changelog

Developers

  • Documentation
  • API Reference
  • SDKs
  • MCP Servers
  • Status

Company

  • About
  • Blog
  • Careers
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Security

© 2026 AgentDyne, Inc. All rights reserved.

All systems operational
v2.0.0Changelog