Production System Prompt Patterns

Real production prompts converge on the same structural moves: explicit roles, hard constraints with recovery paths, and tool descriptions that act like operating manuals. These are maintenance patterns as much as prompting patterns.

Why Look at Production Prompts

System prompt advice usually starts from first principles. Production prompts show what teams actually did after real users found the edge cases. The prompt grows around failures: users ask for unsafe things, tools get misused, policies need exceptions, and product features need exact routing rules.

That makes published and leaked prompts messy but instructive; they are not templates to copy, they are fossil records of operational problems.

Three patterns show up repeatedly: hard constraints are separated from soft behavior, role framing becomes a priority system rather than a persona, and tool descriptions carry much more context than their names suggest.

Constraints Need Recovery Paths

The strongest production constraints do not stop at “do not”; they pair the prohibition with the behavior the model should execute instead.

ChatGPT’s DALL-E tool definition is a good example. The content policy rules are numbered, specific, and paired with substitution procedures. A rule about named artists does not only say to avoid living artists; it gives a mechanical replacement procedure: substitute the artist name with style adjectives, an associated movement or era, and the primary medium. The constraint and the recovery path are fused into one instruction.

GitHub Copilot CLI uses the opposite style: a compact prohibited-actions block with short hard stops. The section works because the domain is narrow and the fallback is explicit: stop and tell the user when the requested action hits a boundary, without trying to explain the whole policy inside the prompt.

Claude’s prompts tend to use more prose, with hardcoded bright lines separated from soft defaults. That adds length, but it gives operators room to adjust behavior without weakening the non-negotiable constraints.

The pattern is the same across all three: a negative constraint works when it is specific, prominent, and paired with a next action. A vague prohibition buried inside a paragraph is mostly decoration.

Roles Become Trust Hierarchies

Simple applications can get away with one role sentence, and “You are a support assistant for Acme” may be enough when the only job is answering policy questions. General assistants and agentic products need something stronger: a way to resolve conflicts between instruction sources.

Anthropic’s Assistant Axis research describes Claude as serving multiple principals: Anthropic, operators, and users. That is role framing at product scale. The prompt gives the model a structure for deciding whose instruction wins when the sources disagree.

This transfers directly to application prompts. A production agent may receive developer instructions, user preferences, retrieved documents, tool outputs, and memory. Without an explicit hierarchy, the model has to infer which source has priority. That is how retrieved text accidentally overrides product policy, or a user preference conflicts with a safety rule and the model tries to satisfy both.

A useful role definition therefore does more than set tone. It states the domain, the audience, the decision priorities, and the trust order; the persona is the easy part, while the arbitration rules make it hold under pressure.

Tool Descriptions Are Operating Manuals

Production prompts spend a surprising amount of space on tools, and that is not waste by default; tool descriptions are the model’s manual for what actions exist, when to use them, and how to interpret results.

Claude’s browser-side analysis tool is described with detailed usage boundaries. It says when to use the tool for complex calculations and when not to use it for simple arithmetic. The examples matter because they define the line better than a generic sentence could. Without that boundary, the model would route too much work through the tool, adding latency and cost for tasks it can handle natively.

Copilot CLI’s shell guidance does similar work for execution mode. In practice, a shell exposes several modes: sync commands, async jobs, detached processes, interactive programs, file inspection, and verification. The description functions like a small operating manual because mode selection affects correctness.

This is where tool descriptions become context engineering rather than API documentation. They are part of the planning surface the model sees on every turn.

The Maintenance Lesson

Production prompts get long because they accrete solutions to yesterday’s failures. Some of those solutions still earn their place, while others become dead weight after the model changes, the product changes, or the harness changes.

The prompt needs the same maintenance loop as code: version it, review diffs, run evals, and remove old patches when the underlying failure disappears. Does this instruction still prevent a failure we can observe? Anthropic’s Claude Code quality postmortem is a useful reminder here: one cache optimization that dropped older thinking blocks, and one short system prompt instruction about verbosity, were enough to change coding quality. Context engineering failures can hide in places that do not look like prompts.

Use this standard in review: hard constraints should be explicit and paired with recovery paths. Roles should include priorities and trust hierarchy when multiple instruction sources exist. Tool descriptions should tell the model when to use the tool, when not to, and what the result means. Everything else has to keep proving it belongs.