We've been quietly maintaining an internal prompt library for 14 months. It started as one shared Notion page. It's now a proper system. Here's the structure, the content, and the things we got wrong before we got it right.
The four prompt categories
- Intake, understanding the brief and asking the right clarifying questions.
- Exploration, generating three structurally different directions for any layout.
- Production, variant generation, asset resizing, copy localisation.
- Review, checking outputs against brand kit, brief, and craft baseline.
Each category has 6, 12 prompts, each one tied to a specific designer task. They're not 'mega-prompts' that try to do everything. They're surgical: one prompt, one job, one well-shaped output. The senior designer chains them together.
What our system prompt looks like
Every prompt starts with the same 60-line system block. It establishes who the assistant is (a design production tool, not a creative director), what brand it's working in (the kit gets injected from Que), and what guardrails apply (no copy beyond a fixed length, no asset generation that bypasses review).
Most prompt failures aren't prompt failures. They're scope failures. The model is trying to do five jobs because the prompt asked for five jobs.
Karan Bhatia
The review checklist
Every AI output gets a four-step human review before it ships:
- Brand check, does this respect the kit (colour, type, voice)?
- Brief check, does this answer the actual question the founder asked?
- Craft check, would I be proud to ship this with my name on it?
- Trade-off check, what got worse to make something get better, and is the trade-off worth it?
If any check fails, the senior designer either iterates by hand or kicks back to a different prompt. The agent doesn't ship anything alone. Ever.
What we got wrong
First: we tried to build agentic loops where the model critiqued its own work. It was worse than no critique. The model is bad at telling you what's not on-brand because it doesn't know what 'on-brand' means in your team. It only knows what it means on average across the internet.
Second: we tried to consolidate everything into one giant prompt. It was unreadable, unmaintainable, and produced worse outputs. Surgical prompts won. Always.
Third: we underinvested in the brand kit format. The model can only be as on-brand as the document it's reading. We rewrote our brand kit format three times before we got one that AI agents could parse reliably. That document is now the most important file in the team.



