How to build an MCP server
Expose a system to AI agents through a [Model Context Protocol](/mcp) server — with tools an agent can actually use well, credentials it cannot leak, and logging that tells you what it did.
Before you start
- A target system with an API or library you can call, and a credential scoped for the access you intend to expose
- A development machine with Python or Node.js — the two SDKs with the deepest support
- An MCP client to test against (Claude Desktop, Claude Code, Cursor, or VS Code all speak the protocol)
Steps
- 1
Decide what the server exposes — as verbs, not endpoints
List the five to ten things an agent should be able to do with your system, phrased as actions: search_invoices, create_ticket, get_customer. Resist mirroring your REST API one-to-one — an agent choosing between forty thinly-described endpoints performs worse than one choosing between eight purposeful tools, and every tool you expose is permission surface you now own. Start with reads; add writes once the reads behave.
- 2
Scaffold with an official SDK
Use the official Python or TypeScript SDK rather than implementing the protocol by hand — the specification at modelcontextprotocol.io covers the wire format, but the SDKs handle sessions, capability negotiation, and transport so your code is mostly tool definitions. A minimal server is a few dozen lines: define tools, give each a name, description, and input schema, and register handlers.
- 3
Write tool descriptions as the interface they are
The model reads your tool names, descriptions, and parameter schemas to decide what to call and how — they are an API contract with a non-deterministic consumer. State what each tool does, when to use it, and what it returns; constrain parameters with enums and formats rather than prose; and say what the tool must NOT be used for when misuse is plausible. A vague description is a bug that manifests as the agent calling the wrong tool.
- 4
Keep credentials on the server side
The server holds the credential for the target system; the client and the model never see it. Scope that credential to exactly what the tools do — a read-only server gets a read-only key. If the server will act on behalf of different users, resolve the user's own authorisation per session instead of wielding one god-token for everyone; the audit trail downstream should distinguish who the agent was working for.
- 5
Choose the transport for where it will run
stdio runs the server as a local child process of the client — right for personal and development use, and the default most examples assume. Streamable HTTP makes it a network service multiple agents can reach — which is the moment it needs real authentication (OAuth is the spec's answer), TLS, and the threat model of any other exposed service. Build local-first, but decide before anyone else depends on it which deployment you are actually supporting.
- 6
Test with a real client, then try to break it
Wire the server into an MCP client and watch an agent use it on real tasks — the MCP Inspector tool lets you exercise tools directly when you need to isolate behaviour. Then probe the failure modes: malformed parameters, gigantic results, the model calling tools in an order you did not anticipate. How the server fails matters as much as how it works; an error message that explains itself gets corrected by the agent, a stack trace gets retried forever.
- 7
Log every invocation and register the server
Record each tool call — which tool, which parameters, which session, what was returned — because your server is now part of an agent's audit story and will be asked questions during someone else's incident. Then register it wherever your organisation tracks agent infrastructure: owner, target system, credential scope, deployment. An MCP server nobody knows about is shadow infrastructure with a credential inside.
Common pitfalls
- Mirroring the API instead of designing tools. Forty endpoint-shaped tools maximise both the agent's confusion and your permission surface; the work is in choosing the eight that matter.
- Fat tools that do everything via a mode parameter — they defeat permission scoping, because you can no longer grant the safe half without the dangerous half.
- Treating descriptions as documentation rather than interface. The model acts on what the description says; nobody reviews it after the demo works, and behaviour drifts when it is wrong.
- Assuming stdio forever. The jump from local child process to shared HTTP service changes the threat model completely, and it tends to happen by enthusiasm rather than decision.
- Returning raw, unbounded results. A 50,000-row query result blows the agent's context and your token bill; cap, paginate, and summarise on the server side.