Prompt Engineering

Prompt engineering tutorial

Goal

Build practical prompt engineering skills by working through a progression of prompting tasks — from basic specification to few-shot examples, chain-of-thought, and structured output — with a language model accessible via a chat interface or API.

Before you start

  • Access to a language model via a chat interface or API
  • Ability to send prompts and observe outputs
  • No programming experience required; optional API access enables more systematic testing

Steps

  1. 1

    Zero-shot prompting: task specification without examples

    Start with the most direct approach: a clear task description with no examples. Write a prompt that specifies what you want the model to do, the constraints on the output, and the format you expect. Test it on three to five different inputs. Observe where the output is exactly what you specified, where it is close but not quite right, and where it fails. The gaps between your specification and the outputs you receive are the starting point for improvement. This exercise reveals which parts of your task description were clear and which were ambiguous to the model.

  2. 2

    Add constraints to eliminate unwanted output patterns

    After observing your zero-shot results, identify the most common undesired output pattern: outputs that are too long, that include information you did not ask for, that use a tone you did not want, or that interpret the task differently than intended. Add a specific negative or positive constraint to address that pattern. Test the same inputs again and confirm that the constraint had the intended effect. Add one constraint at a time — stacking multiple new constraints makes it hard to tell which one fixed the problem and whether any introduced new issues.

  3. 3

    Few-shot prompting: show the model what correct looks like

    For tasks where correct output is easier to demonstrate than to describe, add worked examples to the prompt. Write two or three complete input-output pairs that show the model exactly what a correct response looks like. Place the examples before the actual task input. Test the prompt on your original inputs and on new ones. Compare the few-shot results with the zero-shot results: few-shot prompts typically improve consistency and format compliance, and reduce the need for elaborate verbal instructions. The quality of your examples matters — use representative, correctly annotated examples rather than edge cases or idealized ones.

  4. 4

    Chain-of-thought prompting: get the model to reason before answering

    For tasks that require multi-step reasoning — math, logic, classification with multiple criteria, code analysis — prompting the model to show its reasoning before giving an answer often improves accuracy. Add an instruction like 'Think through this step by step before giving your final answer' or 'First, identify the relevant factors. Then, apply them one at a time before reaching a conclusion.' Test on tasks where you can verify whether the reasoning is correct, not just the final answer. When the reasoning is wrong, use it to diagnose whether the model misunderstood the task, lacked relevant information, or made a reasoning error.

  5. 5

    Structured output: constraining format for downstream use

    Many applications need outputs in a specific machine-readable format — JSON, CSV, a structured list, XML. Prompting the model to produce structured output requires: an explicit format description, an example of a correctly formatted output, and often a constraint that limits the response to only the structured content without surrounding explanation. Test that the output is parseable by the downstream system, not just readable to a human. Models can produce output that looks like JSON but is not valid JSON — add validation to your testing so format errors surface before the output is used in code.

  6. 6

    Evaluate and improve: building a feedback loop

    Prompt engineering without evaluation is guesswork. Build a simple evaluation loop: a set of test inputs with expected outputs, a prompt to test, and a record of outputs and whether they met criteria. Run this loop each time you change the prompt. When you find a failure, add the failing input to your test set before fixing it — this prevents the same failure from returning unnoticed. Over time, your test set becomes a regression suite that protects improvements you have already made. The discipline of evaluation is what separates one-off prompt experiments from a reliable production prompting practice.

Common pitfalls

  • Skipping evaluation and judging prompts by whether the first output looks good: single outputs are not reliable indicators of prompt quality across the full input distribution.
  • Adding more examples to fix a problem that is actually a missing constraint: examples improve consistency; constraints change behavior. Diagnosing which one a failure requires before trying to fix it.
  • Writing chain-of-thought instructions without verifying the reasoning: a model that reaches the right answer through flawed reasoning will produce wrong answers on related inputs.
  • Assuming a prompt that works with one model will work with another: prompts are sensitive to model behavior and may need revision when moving between models or model versions.
  • Not saving prompt versions: without version history, you cannot roll back to a version that worked when an update causes a regression.

Frequently asked questions

What is the difference between zero-shot, one-shot, and few-shot prompting?

Zero-shot prompting gives the model a task description with no examples. One-shot prompting includes one example of a correct input-output pair. Few-shot prompting includes two or more examples. More examples generally improve consistency for complex tasks but increase prompt length and cost. For simple tasks, zero-shot with a clear specification is often sufficient; for complex or nuanced tasks, few-shot examples improve quality.

Does prompt engineering work differently with different models?

Yes. Models differ in how they respond to instruction style, how strictly they follow constraints, how they interpret examples, and what reasoning approaches work best. Techniques that work well with one model may underperform with another. If you are switching models, treat your existing prompts as a starting point rather than a finished product, and re-evaluate against your test set with the new model.

Is your organisation ready for AI agents?

Take the assessment →