An agentic customer service team built for SMBs that can’t afford to get it wrong.
Hanifa is a luxury fashion brand serving women sizes 0–24 for over 14 years. After a customer service crisis shook the brand’s trust with its community, I saw a product problem worth solving: how does a small business maintain quality customer communication when the team is small, the stakes are high, and one careless response can undo years of trust?
I built a team of six AI agents to handle that workflow. Each one has a defined role, scoped knowledge, and clean handoffs. Not a chatbot. Not a template engine. A system that thinks before it writes.
A yearly pre-order event that has always gone smoothly. In 2025, production delays pushed orders back two months. No updates. No timelines. No acknowledgment from a brand customers trusted for over a decade. The community took to social media. If the brand had an agentic system like this in place, timely, on-brand communication could have gone out before the frustration became a crisis.
The crisis broke trust. Now every single interaction is a chance to rebuild it. These agents don’t just handle customer service; they protect the brand’s reputation while restoring the relationship. Body positivity, empathy, and community respect aren’t just guidelines, they’re built into every agent’s instructions.
No new collections. No restock timelines, and customers are asking about items that aren’t coming back. Every agent knows this and communicates with the sensitivity that a brand in recovery demands. No false promises. No deflection. Honest, warm responses that acknowledge the disappointment and keep the door open.
Independent build. I am not affiliated with Hanifa, I identified a public CX challenge and built a solution.
Six specialized agents organized by function: sense, orchestrate, research internally, research externally, write, and evaluate. No agent crosses its lane. I can test, iterate, and improve each one independently without breaking the pipeline.
One agent doing everything is fast to build and impossible to debug. Six agents means when something breaks, I know exactly where.
RECEPTION runs Haiku, fast and consistent for classification. ATELIER runs Sonnet, needs voice and personality. MAÎTRE runs Sonnet Extended Thinking at temperature 0.0 because the final quality gate gets zero randomness.
RECEPTION senses but never drafts. CONCIERGE orchestrates but never writes. MAÎTRE evaluates but never rewrites. Every boundary is explicit. Same discipline you’d enforce across a cross-functional team where expertise is critical.
SOMMELIER searches the internal knowledge base. CURATOR searches the web. They never cross. Mixing sources is how AI gives confident wrong answers. In any domain where accuracy matters, you need to know exactly where your agent’s knowledge comes from.
Reads the emotional temperature of every incoming email. Classifies tone and urgency so every downstream agent knows exactly what they’re walking into before they respond.
The boss. Classifies, coaches downstream agents, manages routing and revision loops.
Searches the KB to ground every response in verified, accurate information. No guessing, no hallucinating. If it’s not in the KB, it says so.
Conditionally deployed. Fills external gaps: shipping status, industry context, real-time information the knowledge base doesn’t carry.
The MVP of this team. Shoulders the responsibility of rebuilding trust through every email it drafts. Writes in the brand’s luxury voice using industry communication best practices.
Final quality gate. Every draft scored against a 7-dimension rubric before a human ever sees it. Nothing ships without passing.
CONCIERGE reads the incoming email plus RECEPTION’s sentiment analysis, then coaches every downstream agent with situation-specific direction. Not generic instructions. Tailored guidance based on what this particular customer needs.
CURATOR only activates when CONCIERGE identifies a gap the internal KB can’t fill. Routine emails skip it. Cost and latency stay low for the simple ones; capability is there for the complex ones.
When a draft fails eval, CONCIERGE doesn’t start over. It routes to the specific fix: tone adjustment, deeper KB search, external research, or full rework. The system diagnoses and prescribes.
Maximum 3 revision loops, then it stops and escalates. The system never loops indefinitely. If AI can’t get it right in three passes, the right answer is a person.
Every email goes through quality evaluation, a routing decision, and human approval before it reaches the customer. Whether it’s a body image misstep in fashion or a tax calculation error in payroll, the only pattern that works is AI proposes, human decides.
Every word either rebuilds trust or erodes it. The writing agent was trained on a detailed voice analysis so the output sounds like the brand, every time.
| Version | Author | Date |
|---|---|---|
| v0.1 | Maurisa Westbury | March 2026 |
Every draft gets scored against 7 dimensions before a human sees it. A critical failure doesn’t get flagged. It gets blocked.
Did we address the person, not the ticket? Did we solve AND elevate? Did we anticipate the follow-up? Did we leave the door open? Would we be comfortable if this were posted publicly? 4 out of 5 to pass.
Five test scenarios: crisis-adjacent complaint, body positivity sizing question, new customer celebration, press inquiry where the correct output is NO draft at all, and a standard return. If it handles these five, it handles the middle.
Not every email should get an AI response. Press inquiries, legal issues, active crises: the system recognizes these and stops. No draft generated. Human only. Knowing when NOT to respond is just as important as knowing how.
Scoring: EXCELLENT / GOOD / ACCEPTABLE / NEEDS WORK / CRITICAL FAILURE. Any critical failure blocks entirely.
Discovery to delivery. Engineering-ready specs. The right conversations with the right teams at the right time.
What’s the problem, who owns it, and what does solved look like? Get stakeholders aligned on what we’re building and why.
The knowledge base is part of the product. I go deep on domain research by leaning into existing documentation and human conversations so I’m walking into engineering with context, not assumptions.
Engineering-ready specs. Agent roles, boundaries, guardrails, edge cases, acceptance criteria. Clear enough that engineering can execute.
I work cross-functionally, speaking the language well enough to have trade-off conversations and drive informed decisions about architecture, model selection, and system behavior.
Define and run the scenarios. Observe the patterns. Adjust. Test again. Eval criteria documented upfront so QA isn’t a vibe check, it’s a measured loop.
Human checkpoints active. Feedback collection in place. Shipping is the beginning, not the end. Iterate based on what the data says.