Retrieval Config

Precision-driven Retrieval for RAG — built for secure, controlled environments.

RetrievalConfig in Web Data Source defines exactly what content becomes searchable intelligence inside your air-gapped compatible, isolated environment deployment. It gives teams full control over what gets embedded, ensuring secure data extraction, data sovereignty compliance, and enterprise-grade security in a Zero-Trust architecture.

Why RetrievalConfig is a game changer
✔️ Selective indexing — enroll only the meaningful parts of a page: product descriptions, documentation bodies, structured details
✔️ Noise-free retrieval — exclude menus, ads, boilerplates, and footers from embeddings
✔️ High-quality RAG — better vectors, fewer hallucinations, more relevant answers
✔️ Works with private network crawling, entirely no internet dependency
✔️ Guaranteed compliance in controlled access environments

How it works

PathPattern-based targeting - define which URLs or URL segments should be indexed.
CSS/XPATH selectors - extract just the valuable content blocks for embedding.
Token-aware chunking - control chunk sizes to match your LLM architecture, ensuring stable, consistent embeddings.

Together, these capabilities let you build a curated, hyper-relevant knowledge base from any website — including those inside private or restricted networks — while preserving maximum security and architectural control.

🔗 Full API reference

Retrieval Config