Chain-of-thought and reasoning prompts
Chain-of-thought prompting instructs the model to reason through a problem step by step before producing a final answer. On tasks requiring multi-step reasoning — mathematical problems, logical deduction, complex classification — this approach produces meaningfully higher accuracy than prompting for a direct answer. Variants include zero-shot chain-of-thought (adding 'think step by step' to the prompt without examples), few-shot chain-of-thought (providing worked examples that include reasoning steps), and least-to-most prompting (breaking a complex problem into progressively harder sub-problems and solving each in sequence). The mechanism is not fully understood, but the effect is empirically well-documented across model families and task types.
Self-consistency and ensemble techniques
Self-consistency generates multiple completions for the same prompt at some temperature above zero and selects the most common answer across samples. This works because reasoning errors are often inconsistent — the model reaches the wrong answer via different paths — while the correct answer tends to appear in the majority of samples for problems the model can handle. The technique increases reliability at the cost of more API calls and latency. It is most useful on tasks with discrete correct answers (math, factual questions, structured extraction) and less useful on open-ended generation where there is no single correct output to select.
Structured prompting patterns for complex tasks
For complex tasks, prompting structure affects output quality beyond the content of individual instructions. Role-definition prompts establish the model's perspective and expertise before the task, which improves consistency on domain-specific work. Constraint enumeration prompts list what the model should and should not do in explicit terms, reducing ambiguity. Decomposition prompts break a complex task into named stages and instruct the model to address each stage before proceeding. Meta-prompts instruct the model to evaluate its own output against defined criteria before finalizing it. Each pattern adds prompt length and processing time, so the benefit needs to exceed that overhead — which it typically does only for high-stakes or complex tasks.