Every Response Is the Brand, Meet the Agents That Protect It

The Hanifa
AI Service Team

An agentic customer service team built for SMBs that can’t afford to get it wrong.

6
Agents
E2E
Workflow
85%+
Brand Voice Match
HITL
Human-in-the-Loop
Built by Maurisa Westbury Q1 2026 Claude (Haiku + Sonnet) · RAG · Multi-Agent Production pipeline in development
The Project

Hanifa is a luxury fashion brand serving women sizes 0–24 for over 14 years. After a customer service crisis shook the brand’s trust with its community, I saw a product problem worth solving: how does a small business maintain quality customer communication when the team is small, the stakes are high, and one careless response can undo years of trust?

I built a team of six AI agents to handle that workflow. Each one has a defined role, scoped knowledge, and clean handoffs. Not a chatbot. Not a template engine. A system that thinks before it writes.

Business Context
The Problem
+

A yearly pre-order event that has always gone smoothly. In 2025, production delays pushed orders back two months. No updates. No timelines. No acknowledgment from a brand customers trusted for over a decade. The community took to social media. If the brand had an agentic system like this in place, timely, on-brand communication could have gone out before the frustration became a crisis.

Post-Crisis Communications
+

The crisis broke trust. Now every single interaction is a chance to rebuild it. These agents don’t just handle customer service; they protect the brand’s reputation while restoring the relationship. Body positivity, empathy, and community respect aren’t just guidelines, they’re built into every agent’s instructions.

Production Pause
+

No new collections. No restock timelines, and customers are asking about items that aren’t coming back. Every agent knows this and communicates with the sensitivity that a brand in recovery demands. No false promises. No deflection. Honest, warm responses that acknowledge the disappointment and keep the door open.

Independent build. I am not affiliated with Hanifa, I identified a public CX challenge and built a solution.

Architecture
What I Built
Meet the Agents
Six agents. Defined roles. Clean handoffs. One workflow.
Agent Design

Each agent has one job, and explicit rules about what it doesn’t do.

Six specialized agents organized by function: sense, orchestrate, research internally, research externally, write, and evaluate. No agent crosses its lane. I can test, iterate, and improve each one independently without breaking the pipeline.

Why 6 agents instead of one
+

One agent doing everything is fast to build and impossible to debug. Six agents means when something breaks, I know exactly where.

The right model for the right job
+

RECEPTION runs Haiku, fast and consistent for classification. ATELIER runs Sonnet, needs voice and personality. MAÎTRE runs Sonnet Extended Thinking at temperature 0.0 because the final quality gate gets zero randomness.

What each agent doesn’t do
+

RECEPTION senses but never drafts. CONCIERGE orchestrates but never writes. MAÎTRE evaluates but never rewrites. Every boundary is explicit. Same discipline you’d enforce across a cross-functional team where expertise is critical.

Why knowledge boundaries matter
+

SOMMELIER searches the internal knowledge base. CURATOR searches the web. They never cross. Mixing sources is how AI gives confident wrong answers. In any domain where accuracy matters, you need to know exactly where your agent’s knowledge comes from.

See the PRD ↓
RECEPTION
Sentiment Analysis & Guest Understanding
+

Reads the emotional temperature of every incoming email. Classifies tone and urgency so every downstream agent knows exactly what they’re walking into before they respond.

Claude Haiku 4.5Temp 0.2
CONCIERGE
Orchestration & Intelligent Routing
+

The boss. Classifies, coaches downstream agents, manages routing and revision loops.

Claude Sonnet 4.5Temp 0.1–0.3
SOMMELIER
Internal Knowledge Expert
+

Searches the KB to ground every response in verified, accurate information. No guessing, no hallucinating. If it’s not in the KB, it says so.

Claude Sonnet 4.5Temp 0.2
CURATOR
External Intelligence Specialist
+

Conditionally deployed. Fills external gaps: shipping status, industry context, real-time information the knowledge base doesn’t carry.

Claude Haiku 4.5Temp 0.2Conditional
ATELIER
Communication Craftsmanship
+

The MVP of this team. Shoulders the responsibility of rebuilding trust through every email it drafts. Writes in the brand’s luxury voice using industry communication best practices.

Claude Sonnet 4.5Temp 0.6
MAÎTRE
Evaluation & Quality Standards
+

Final quality gate. Every draft scored against a 7-dimension rubric before a human ever sees it. Nothing ships without passing.

Sonnet 4.5 Extended ThinkingTemp 0.0
Workflow
How It Works
A single email triggers the right follow-ups across the entire team.
Orchestration

Every email follows a pipeline with built-in self-correction and human checkpoints.

CONCIERGE reads the incoming email plus RECEPTION’s sentiment analysis, then coaches every downstream agent with situation-specific direction. Not generic instructions. Tailored guidance based on what this particular customer needs.

Not every interaction needs every agent
+

CURATOR only activates when CONCIERGE identifies a gap the internal KB can’t fill. Routine emails skip it. Cost and latency stay low for the simple ones; capability is there for the complex ones.

When a draft fails, the system knows why
+

When a draft fails eval, CONCIERGE doesn’t start over. It routes to the specific fix: tone adjustment, deeper KB search, external research, or full rework. The system diagnoses and prescribes.

Three tries, then human hands
+

Maximum 3 revision loops, then it stops and escalates. The system never loops indefinitely. If AI can’t get it right in three passes, the right answer is a person.

Why the system drafts but never sends
+

Every email goes through quality evaluation, a routing decision, and human approval before it reaches the customer. Whether it’s a body image misstep in fashion or a tax calculation error in payroll, the only pattern that works is AI proposes, human decides.

Voice isn’t a nice-to-have when you’re rebuilding trust
+

Every word either rebuilds trust or erodes it. The writing agent was trained on a detailed voice analysis so the output sounds like the brand, every time.

Without the Agentic Service Team
  • Single shared inbox, no triage
  • Manual responses, hours to draft
  • No consistent brand voice
  • No quality check before sending
  • One bad response away from a crisis
With the Agentic Service Team
  • Every email triaged by sentiment and urgency
  • Drafted in minutes, in the brand’s voice
  • 7-dimension quality evaluation on every draft
  • Intelligent routing when something needs a fix
  • Built-in escalation for press, legal, and crisis scenarios
  • Nothing reaches the customer without human approval
Evidence
The PM Work
What I produce before a single agent gets built.
Excerpt from the Hanifa AI Service Team PRD. Every build starts with one.
Product Requirements Document
Hanifa AI Service Team
Agentic Customer Service System
+
VersionAuthorDate
v0.1Maurisa WestburyMarch 2026
Executive Card
ProblemHanifa receives customer emails about orders, returns, sizing, press, events, and wholesale. All handled manually, inconsistently, and often delayed. A 2025 pre-order crisis exposed how critical fast, empathetic, on-brand communication is to brand survival.
UsersHanifa customers (loyal community, sizes 0–24) + human reviewer and approver
MVP ScopeAutomated email triage → sentiment analysis → research → draft in brand voice → quality evaluation → human review → send. Every email is human-approved.
WorkflowGmail → Filter → RECEPTION → CONCIERGE → SOMMELIER → CURATOR → ATELIER → MAÎTRE → CONCIERGE → Slack → Human → Send
Top 3 Risks1. Pre-order/delay info may not be in the KB, CURATOR fills real-time gaps.  2. Brand voice is specific, ATELIER needs detailed voice training.  3. Press and wholesale emails need hard escalation rules, not AI drafts.
Knowledge GuardrailsAgents only reference verified KB content or cited external sources. No fabricated policies, no invented timelines, no promises the brand can’t keep. If the answer isn’t in the KB, the agent says so.
Success Metrics
KPITargetHow Measured
Email-to-draft< 1 hourWorkflow timestamp vs Gmail receipt
Approval rate> 70%Approve/edit/reject tracking in Slack
Voice match> 85%Observational eval using rubric
AI cost/email< $0.50Token usage logs
Review time< 3 minTimed during testing
Hallucination0%Zero fabricated policies or prices
The framework behind every agent on this team. Five elements, applied consistently.
Agent Design Framework
RCICE
A repeatable structure for designing reliable AI agents
+
R
Role
What the agent is and who it serves. One sentence that defines identity before anything else.
The agent reads this first and uses it as the lens for every instruction that follows.
C
Context
What the agent needs to know about its environment. Curated background knowledge placed on the workbench.
Context is what separates a generic AI response from a grounded one.
I
Instructions
How the agent does its work, step by step. The process, the decision logic, the guardrails.
A well-instructed agent doesn’t guess; it follows a designed workflow.
C
Criteria
What “good” looks like. The quality bar the agent evaluates its own output against before delivering.
Without criteria, agents optimize for producing an answer, not producing a good one.
E
Examples
What ideal output looks like in practice. Few-shot demonstrations that shift the model toward the right patterns.
Examples do more to shape agent behavior than any amount of abstract instruction.
Quality
Evaluation & Guardrails
Every draft scored. Every failure caught. Every edge case tested.
Eval Framework

MAÎTRE doesn’t get creative. It just scores the work.

Every draft gets scored against 7 dimensions before a human sees it. A critical failure doesn’t get flagged. It gets blocked.

Five questions on every draft
+

Did we address the person, not the ticket? Did we solve AND elevate? Did we anticipate the follow-up? Did we leave the door open? Would we be comfortable if this were posted publicly? 4 out of 5 to pass.

The edge cases that break AI
+

Five test scenarios: crisis-adjacent complaint, body positivity sizing question, new customer celebration, press inquiry where the correct output is NO draft at all, and a standard return. If it handles these five, it handles the middle.

Knowing when to step aside
+

Not every email should get an AI response. Press inquiries, legal issues, active crises: the system recognizes these and stops. No draft generated. Human only. Knowing when NOT to respond is just as important as knowing how.

MAÎTRE’s 7-Dimension Rubric
Guardrails & Escalation
Does this need a human instead of AI? Press inquiries, legal questions, crisis scenarios get flagged and routed. No draft generated.
Grounded in Knowledge, Not Guessing
Every claim in the email traces back to the KB or verified research. Nothing made up. Nothing hallucinated.
Answer the Question and the Next One
Did we address what they asked? Did we anticipate what they’ll ask next? Both matter.
Sounds Like the Brand
Luxury customers deserve communication that meets them where they are. Elevated, respectful, and human. Not corporate, not casual, not generic.
Empathy Match
Is the response calibrated to how this customer is feeling right now? A frustrated customer and a first-time buyer need completely different energy.
Clean and Professional
Grammar, formatting, structure. The basics that signal a brand takes its communication seriously.
Respect for the Community
Body-positive language. Cultural sensitivity. No tone-deaf responses to a community that’s watching closely and has every right to.

Scoring: EXCELLENT / GOOD / ACCEPTABLE / NEEDS WORK / CRITICAL FAILURE. Any critical failure blocks entirely.

My Process

How I Ship Product

Discovery to delivery. Engineering-ready specs. The right conversations with the right teams at the right time.

01
Discovery
+

What’s the problem, who owns it, and what does solved look like? Get stakeholders aligned on what we’re building and why.

02
Research & Knowledge Design
+

The knowledge base is part of the product. I go deep on domain research by leaning into existing documentation and human conversations so I’m walking into engineering with context, not assumptions.

03
Requirements
+

Engineering-ready specs. Agent roles, boundaries, guardrails, edge cases, acceptance criteria. Clear enough that engineering can execute.

04
Build
+

I work cross-functionally, speaking the language well enough to have trade-off conversations and drive informed decisions about architecture, model selection, and system behavior.

05
Test & Iterate
+

Define and run the scenarios. Observe the patterns. Adjust. Test again. Eval criteria documented upfront so QA isn’t a vibe check, it’s a measured loop.

06
Ship & Monitor
+

Human checkpoints active. Feedback collection in place. Shipping is the beginning, not the end. Iterate based on what the data says.

See the framework ↓