RAG vs fine-tuning
RAG gives a model access to knowledge at question time; fine-tuning changes the model's weights to alter how it behaves. Teams reach for them interchangeably because both 'teach the model about our stuff' — but they solve different problems, and the most common mistake is fine-tuning to inject facts, which is the job retrieval does better, cheaper, and reversibly.
| Dimension | RAG | Fine-tuning |
|---|---|---|
| What it changes | What the model can see — context per query | What the model is — weights and behaviour |
| Right problem | Knowledge: facts, documents, freshness | Behaviour: format, style, domain reflexes |
| Updating | Re-index a document; effect is immediate | Re-train and re-deploy; effect is a release |
| Traceability | Answers cite retrieved sources | Knowledge is baked in — no citation possible |
| Access control | Enforceable at retrieval, per user | None — everyone gets what training saw |
| Cost shape | Ongoing per-query retrieval and tokens | Up-front training runs plus per-version maintenance |
| Failure mode | Bad retrieval produces ungrounded answers | Drift, regressions, and stale knowledge frozen in |
The verdict
Default to RAG for anything that is knowledge — documents, facts, policies, anything that changes or needs a citation or carries permissions — and reserve fine-tuning for behaviour the prompt cannot reliably buy: rigid output formats, deep domain phrasing, latency-critical reflexes where you are paying per-token for instructions the weights could absorb. The two compose: a fine-tuned model retrieving through RAG is common in production. The decision flips toward fine-tuning only when the same instructions ride every prompt at volume, or the behaviour gap survives serious prompt and retrieval work — and almost never for facts, because weights cannot cite, cannot forget, and cannot respect who is asking.