What changes against classic RAG
In a classic pipeline, every question triggers one retrieval, with the query as written. An agentic system treats retrieval as a tool: the model can rewrite the query, split a complex question into sub-searches, route different sub-questions to different sources, judge whether what came back actually answers anything, and iterate when it does not. Each of those is a decision a fixed pipeline cannot make — and each is a decision that can now be made badly, which is the honest trade.
The patterns in circulation
Most agentic-RAG implementations compose a few recognisable moves. Query planning decomposes the question before any search happens. Self-checking — the model critiques retrieved context for relevance and coverage before using it — catches the silent failure where retrieval returns plausible-but-wrong passages. Multi-source routing sends factual lookups, document search, and structured queries to different tools rather than one index. And iterative deepening retries with reformulated queries when confidence is low. None of this requires exotic machinery; it is the standard [agent loop](/learn/agentic-ai-architecture) with retrieval among the tools.
What it costs and when it pays
Every added decision is added latency, token spend, and a new place to fail — an agent can loop on retrieval as readily as on anything else, so step budgets apply here too. Agentic RAG earns its cost where questions are genuinely multi-hop, sources are heterogeneous, or query quality varies wildly; it is overhead where a single well-tuned retrieval already answers most questions. The operational baseline does not change: retrieved content is untrusted input wherever it enters the loop, and the [security rules](/guides/secure-agentic-ai) that gate consequential actions on retrieved content apply with more force, not less, once the agent is steering its own reading.