Is fine-tuning ever right for injecting knowledge?

Rarely. Weights store knowledge without provenance, without per-user access control, and without an update path short of retraining. The defensible cases are narrow: stable domain vocabulary and patterns, learned once, where being uncitable is acceptable. Documents, policies, and anything a user might ask 'says who?' about belong in retrieval.

Build choices

RAG vs fine-tuning

RAG gives a model access to knowledge at question time; fine-tuning changes the model's weights to alter how it behaves. Teams reach for them interchangeably because both 'teach the model about our stuff' — but they solve different problems, and the most common mistake is fine-tuning to inject facts, which is the job retrieval does better, cheaper, and reversibly.

Dimension	RAG	Fine-tuning
What it changes	What the model can see — context per query	What the model is — weights and behaviour
Right problem	Knowledge: facts, documents, freshness	Behaviour: format, style, domain reflexes
Updating	Re-index a document; effect is immediate	Re-train and re-deploy; effect is a release
Traceability	Answers cite retrieved sources	Knowledge is baked in — no citation possible
Access control	Enforceable at retrieval, per user	None — everyone gets what training saw
Cost shape	Ongoing per-query retrieval and tokens	Up-front training runs plus per-version maintenance
Failure mode	Bad retrieval produces ungrounded answers	Drift, regressions, and stale knowledge frozen in

The verdict

Default to RAG for anything that is knowledge — documents, facts, policies, anything that changes or needs a citation or carries permissions — and reserve fine-tuning for behaviour the prompt cannot reliably buy: rigid output formats, deep domain phrasing, latency-critical reflexes where you are paying per-token for instructions the weights could absorb. The two compose: a fine-tuned model retrieving through RAG is common in production. The decision flips toward fine-tuning only when the same instructions ride every prompt at volume, or the behaviour gap survives serious prompt and retrieval work — and almost never for facts, because weights cannot cite, cannot forget, and cannot respect who is asking.

The verdict

Frequently asked questions