Build choices

RAG vs fine-tuning

RAG gives a model access to knowledge at question time; fine-tuning changes the model's weights to alter how it behaves. Teams reach for them interchangeably because both 'teach the model about our stuff' — but they solve different problems, and the most common mistake is fine-tuning to inject facts, which is the job retrieval does better, cheaper, and reversibly.

Dimension RAG Fine-tuning
What it changes What the model can see — context per query What the model is — weights and behaviour
Right problem Knowledge: facts, documents, freshness Behaviour: format, style, domain reflexes
Updating Re-index a document; effect is immediate Re-train and re-deploy; effect is a release
Traceability Answers cite retrieved sources Knowledge is baked in — no citation possible
Access control Enforceable at retrieval, per user None — everyone gets what training saw
Cost shape Ongoing per-query retrieval and tokens Up-front training runs plus per-version maintenance
Failure mode Bad retrieval produces ungrounded answers Drift, regressions, and stale knowledge frozen in

The verdict

Default to RAG for anything that is knowledge — documents, facts, policies, anything that changes or needs a citation or carries permissions — and reserve fine-tuning for behaviour the prompt cannot reliably buy: rigid output formats, deep domain phrasing, latency-critical reflexes where you are paying per-token for instructions the weights could absorb. The two compose: a fine-tuned model retrieving through RAG is common in production. The decision flips toward fine-tuning only when the same instructions ride every prompt at volume, or the behaviour gap survives serious prompt and retrieval work — and almost never for facts, because weights cannot cite, cannot forget, and cannot respect who is asking.

Frequently asked questions

Is fine-tuning ever right for injecting knowledge?

Rarely. Weights store knowledge without provenance, without per-user access control, and without an update path short of retraining. The defensible cases are narrow: stable domain vocabulary and patterns, learned once, where being uncitable is acceptable. Documents, policies, and anything a user might ask 'says who?' about belong in retrieval.

Is your organisation ready for AI agents?

Take the assessment →