If you are evaluating customer service AI vendors right now, the hardest question is not "which is best." It is "which architectural philosophy fits how our team already operates." A wrong fit costs six months and a seven-figure contract before anyone notices the agent is misaligned with the org chart. By 2026, the patterns have stabilized enough that you can map your context to a pattern before you ever take a sales call.
The category sorts cleanly into four shapes. Sierra's Agent OS with Agent SDK and Agent Studio. Decagon's Agent Operating Procedures combining natural-language SOPs with code-level guardrails. Intercom Fin's deeply helpdesk-native architecture at $0.99 per resolution. Forethought's knowledge-management-first multi-product offering. Cresta's unified Analyze, Augment, and Automate platform. Helpdesk-native AI from Zendesk and Salesforce. Plus a long tail of self-serve and mid-market players including Ada, My AskAI, Eesel, Wonderchat, and Fini.
Each architecture represents a different philosophy about who controls the agent's behavior, how integrations work, and what the deployment timeline looks like. This post covers each architectural pattern, when it fits, the engineering implications, and the build-vs-buy considerations that come up in practice.
The four architectural philosophies
The category sorts into four patterns:
Pattern A: Standalone enterprise platforms with custom configuration. Sierra and Decagon are the two dominant examples. Heavy implementation lift, deep customization, premium pricing, sales-led procurement.
Pattern B: Helpdesk-native AI integrated with existing platforms. Intercom Fin, Zendesk AI, Salesforce Einstein. Lower cost, faster deployment, bound to the helpdesk ecosystem.
Pattern C: Augmentation-first platforms. Cresta, parts of Forethought. AI alongside human agents rather than replacing them. Useful when the goal is improving agent performance more than reducing headcount.
Pattern D: Self-serve and mid-market. My AskAI, Eesel, Wonderchat, Fini. Faster setup, lower cost, less customization. Suitable for smaller deployments or proof-of-concept stages.
Before going deeper, here is the four-pattern comparison on the dimensions that show up in every procurement conversation.
| Dimension | A. Standalone enterprise | B. Helpdesk-native | C. Augmentation-first | D. Self-serve |
|---|---|---|---|---|
| Deployment time | 4 to 24 weeks | 1 to 4 weeks | 4 to 12 weeks | Days to 2 weeks |
| Integration cost | High (custom SDK work) | Low (native to helpdesk) | Medium (multi-product wiring) | Low (standard connectors) |
| Customization depth | Very high | Bound to helpdesk roadmap | High on agent-assist side | Limited |
| Voice support | Native (Sierra) or growing | Varies by helpdesk | Strong (Cresta) | Usually none |
| Annual entry price | Reportedly $40K to $350K+ | $0.99 per resolution and up | Mid five to six figures | $1K to $10K monthly |
Each pattern has architectural implications.
Pattern A: Standalone enterprise platforms
Sierra's Agent OS
Sierra's architecture has two layers visible to buyers:
Agent SDK. Where engineers write code that defines the AI's logic, connects to backend systems, and builds skills. New capabilities, for example a new shipping carrier integration, require developer work in the SDK.
Agent Studio. Dashboard where CX teams adjust tone of voice, fine-tune conversations, and monitor performance. Day-to-day operation lives here without code changes.
The 2026 expansion: Sierra Agent OS 2.0 with eight products including Agent Studio 2.0 (configuration), Agent Data Platform (persistent memory and context across interactions), and Live Assist (human escalation tooling). PCI Level 1 compliance enables direct handling of payment-related conversations. The voice agent stack with proprietary low-latency interaction reportedly handled hundreds of millions of calls in 2025, according to Sierra's product announcements. Outcome-based pricing where customers pay per successful resolution rather than per conversation.
When Sierra fits. Enterprise consumer brands prioritizing brand safety, voice quality, and end-to-end commercial transactions. WeightWatchers, SiriusXM, Sonos, and ADT are publicly cited reference customers. Sierra has stated it works with a meaningful share of the Fortune 50 in its public materials.
Engineering implications. Significant initial integration work. Sales-led procurement, not self-serve. Deployment 4 to 10 weeks for initial rollouts and 3 to 6 months for complex deployments. Per-account pricing reportedly lands around $200K to $350K+ year one based on partner reports and customer disclosures. Strong for organizations that can absorb the up-front investment.
Trace before you commit to a platform
Before signing a Sierra or Decagon contract, run a two-week shadow trace on your existing volume. Respan's tracing captures every retrieval, tool call, and escalation across whatever stack you point it at, so you can quantify deflection ceiling and failure modes against the vendor's promises during the pilot. Start tracing for free at platform.respan.ai.
Decagon's Agent Operating Procedures
Decagon's architecture combines natural-language SOP definition with code-level guardrails:
AOPs (Agent Operating Procedures). CX teams define agent behavior in plain English: "When a customer asks about refund status, look up the order in the OMS, check the refund policy for that product category, and respond with the appropriate template."
Code-level guardrails. Engineering team builds the integrations (CRM, knowledge bases, payments), configures APIs, and writes validation logic that enforces hard constraints regardless of what the AOP says.
Testing infrastructure. Simulation tools, versioning, agent behavior validation, and monitoring ensure predictable performance.
When Decagon fits. Companies with technical CX teams that want flexibility and direct control. Internet-native businesses where CX has rapid iteration capability. Reference customers publicly cited on Decagon's site include Avis Budget Group, Hertz, Block, Affirm, Chime, Wealthsimple, Oura Health, Noom, and ClassPass.
Engineering implications. AOPs are not truly no-code. Engineering builds the integration foundation before CX can iterate. Deployment around 6 weeks compared to Sierra's 4 to 10. Single-agent architecture struggles with highly complex multi-step workflows but enables faster iteration on individual workflows. Pricing reportedly lands in the $40K to $160K annual contract range, with per-deflection rates that can dip as low as $0.07 according to vendor materials.
When standalone enterprise platforms fit
Common across both Sierra and Decagon:
- The customer service operation is large enough to justify the implementation cost
- The brand requires deep customization for tone, voice, and behavior
- Cross-channel orchestration across chat, email, and voice is required
- Multi-step task completion (account changes, refunds, complex resolutions) is in scope
- Compliance posture (PCI, HIPAA, SOC 2) is a procurement requirement
The reverse case: small operations, simple FAQs, or organizations with limited engineering capacity will struggle to justify the cost and timeline.
Pattern B: Helpdesk-native AI
Intercom Fin
Architecture deeply integrated with Intercom's helpdesk: customer profile, ticket history, and conversation data accessible to the agent without separate integration work. Pricing at $0.99 per resolution, published openly by Intercom, is one of the more transparent models in the category. Cross-channel within Intercom's surfaces, with native handoff to human agents inside the same tool.
When Fin fits. Already on Intercom for human-agent operations. Want incremental AI deployment without migrating to a standalone platform. Smaller and mid-market deployments where the price-per-resolution model works.
Zendesk AI
Similar architecture, deeply integrated with Zendesk. Available in tiered configurations from basic AI suggestions to autonomous resolution. Native to Zendesk's data model.
When Zendesk AI fits. On Zendesk and want AI features integrated with existing setup. Lower friction adoption than introducing a separate platform.
Engineering implications of helpdesk-native
Pros. Fastest deployment, lowest integration cost, customer data already in the right place, native escalation to human agents, more predictable pricing.
Cons. Bound to the helpdesk's capabilities and roadmap. Cross-platform deployment is harder if the operation spans Intercom, Zendesk, and Salesforce. Voice capabilities vary by helpdesk.
The trade is integration depth for integration breadth. For organizations on a single helpdesk, the trade is favorable.
Pattern C: Augmentation-first platforms
Cresta
Unified platform across:
- AI Agent. Autonomous resolution where appropriate.
- Agent Assist. Real-time guidance for human agents during conversations, including suggested responses, knowledge surfacing, and sentiment alerts.
- Knowledge Agent. Proactive in-workflow answers surfaced to agents and customers.
- Conversation Intelligence. Analysis of all interactions for quality, training, and pattern detection.
Differentiation: AI as augmentation alongside automation. Value does not depend solely on containment metrics. The platform improves human agent quality while automating routine queries.
When Cresta fits. Organizations that want efficiency gains from automation and quality gains from human augmentation on the same platform. Operations where reducing human headcount is not the primary goal.
Forethought
Multi-product offering:
- Solve. Chatbot for autonomous resolution. Forethought has reported 87% deflection at Grammarly and 52 to 65% self-serve at Upwork in published case studies.
- Triage. Smart routing to right agents.
- Assist. Real-time agent guidance.
- Discover. Knowledge gap identification and trending topic detection.
- Agent QA. Automated quality scoring across 100% of conversations against custom rubrics.
Knowledge-management-first: 18 knowledge management integrations including Confluence, Notion, and Guru, with strong compliance posture across SOC 2 Type II, HIPAA, GDPR, and CCPA.
When Forethought fits. Organizations with rich knowledge management infrastructure that want to leverage it. Knowledge-heavy support volumes including technical products and complex services. Compliance-sensitive industries.
Augmentation needs evals too
Pattern C platforms sound safer than fully autonomous agents, but they introduce their own failure modes: bad suggestions, distracted agents, suggestion fatigue. Respan's eval suite (faithfulness, citation accuracy, refusal correctness) works the same on agent-assist suggestions as on customer-facing answers. Score every suggestion before it lands in the agent UI. See the docs at platform.respan.ai.
Pattern D: Self-serve and mid-market
Self-serve platforms (My AskAI, Eesel, Fini, Wonderchat) emphasize:
- Setup in days rather than weeks
- Self-serve signup
- Lower cost (often $1K to $10K monthly versus $40K+ annual contracts)
- Standard helpdesk integrations
- Less customization depth
When self-serve fits. Smaller operations, typically under 5,000 monthly tickets. Proof-of-concept stage. Organizations exploring AI customer service before committing to enterprise platforms. Knowledge-graph-only architectures, for example Fini's reasoning-first approach with no generative hallucination by design, suit organizations that need accuracy guarantees over flexibility.
Build vs buy
For most organizations, buy is the right answer. Standalone enterprise, helpdesk-native, or self-serve all beat custom-built customer service AI on time-to-value, ongoing maintenance, and feature breadth.
The exceptions where build makes sense:
Highly specialized domain. The platforms target broad customer service workflows. Some domains, including technical product support, regulated medical, and specialized B2B, require depth the platforms do not provide.
Existing platform investment. Organizations with mature CX engineering teams that have already built foundational pieces may extend them rather than replace.
Strategic differentiation. A few organizations treat customer service experience as core competitive advantage where outsourcing the architecture is unacceptable. This is rare and usually wrong.
For most customer service AI initiatives, the architectural decision is which platform fits, not whether to build. The platform-comparison work happens before the build-vs-buy debate, not after.
How a query routes through Pattern A
To make Pattern A concrete, here is how a refund query routes through a standalone enterprise platform like Sierra or Decagon.
The bounded-authority gate is the load-bearing piece. Both Sierra's Agent SDK and Decagon's code-level guardrails exist to make sure the AOP or skill cannot authorize anything outside policy. The gate is also where outcome attribution becomes tractable, since every action passes through one auditable choke point.
Choosing among patterns
A simplified decision flow:
| Profile | Recommended pattern |
|---|---|
| Enterprise consumer brand, voice priority, deep customization | Pattern A: Sierra |
| Enterprise with technical CX team, fast iteration desired | Pattern A: Decagon |
| Already on Intercom, want incremental AI | Pattern B: Intercom Fin |
| Already on Zendesk, want incremental AI | Pattern B: Zendesk AI |
| Want efficiency and agent quality on same platform | Pattern C: Cresta or Forethought |
| Knowledge-heavy support, strong compliance | Pattern C: Forethought |
| Small to mid-market, proof-of-concept stage | Pattern D: Fini, My AskAI, Eesel |
| Strict accuracy requirement, regulated industry | Pattern D: Fini (knowledge graph approach) |
The decision flow is starting points, not final answers. Reference customer conversations and proof-of-concept evaluations are worth running before commitment.
Common architectural mistakes
Patterns that show up across deployed customer service AI in 2025 and 2026:
Treating chat and voice as the same architecture. Voice has different latency, interruption, and emotional handling requirements. Sierra's voice surpassing text in October 2025 reflects how distinct the channel is. Architectures designed for chat first and bolted onto voice produce poor voice experiences.
Single-agent mindset. Both Sierra and Decagon can fall into building one giant agent that does everything. Better: smaller, specific agents with clear boundaries. Easier to test, iterate, and bound failure modes.
No bounded authority on actions. The agent can authorize refunds of any size, change any account setting, issue any discount. Better: bounded authority with clear escalation triggers above thresholds.
Hallucination defense as afterthought. Strict RAG, post-generation verification, and adversarial testing built in from day one beats bolted-on later. Klarna learned this expensively.
No outcome attribution. The team measures deflection but cannot tell which configuration changes improved actual resolution. Without attribution, optimization is guesswork.
Vendor lock-in without exit plan. All standalone platforms create some lock-in across data, integrations, and prompt logic. Documented exit plans are worth maintaining since vendor pricing can change and platforms can be acquired and reshaped.
Make exit plans technical, not theoretical
The most painful version of vendor lock-in is logical: prompts, eval rubrics, and routing logic that only run inside one vendor's runtime. Keep your prompt registry and evals on Respan, even if your serving stack lives inside Sierra or Decagon. If the vendor pivots or pricing changes, the operational substrate moves with you. Talk to us about migration patterns.
What to expect in the next twelve months
Voice as primary channel for many use cases. Sierra's October 2025 milestone was leading. The trend is broader, and voice agent quality is improving fast.
Agent-to-agent dynamics. Customer-side LLMs interacting with merchant-side LLMs become more common. The defense is architectural (deterministic authorization), not procedural.
Compliance posture as table stakes. SOC 2 Type II and the relevant industry frameworks (HIPAA, PCI-DSS, ISO 42001) become standard procurement requirements rather than premium tier features.
Outcome-based pricing standardizes. Pay-per-resolution models, used by Sierra and parts of Decagon, align vendor incentives with actual customer success. More vendors will adopt these models or hybrid versions.
Acquisitions and consolidation. Sierra's acquisitions of Opera Tech and Fragment in 2025 and 2026 are examples. The category is consolidating, and buyers should plan for vendors changing shape over multi-year contracts.
How Respan fits
Whether you ship Decagon's AOPs, Sierra's Agent OS, Intercom Fin's helpdesk-native model, or your own from scratch, the operational substrate underneath looks the same. Respan provides that substrate without binding you to one architectural pattern.
- Tracing: every customer turn, every retrieval, every tool call, every escalation decision captured as one connected trace. Auto-instrumented for LangChain, LlamaIndex, Vercel AI SDK, CrewAI, AutoGen, OpenAI Agents SDK. Session IDs stitch multi-turn conversations into a single debuggable trace, which is what bounded-authority architectures actually require.
- Evals: ten built-in evaluators (faithfulness, citation accuracy, refusal correctness, harmfulness) plus LLM-as-judge and custom Python evaluators. Production traffic flows directly into datasets. CI-aware experiments block regressions on resolution quality, hallucinated policy claims, or escalation correctness before deploys ship.
- Gateway: 500+ models behind an OpenAI-compatible interface, semantic caching for the FAQ-shaped tail, fallback chains, per-customer spending caps. The bounded-authority pattern leans hard on the gateway as the place where action APIs and model calls share the same audit trail.
- Prompt management: versioned registry, dev/staging/prod environments with approval workflows, A/B testing in production with one-click rollback. Policy prompts and escalation logic live here, not in code.
- Monitors and alerts: deflection rate, CSAT, escalation accuracy, latency P95, cost per resolution. Slack, email, PagerDuty, webhook. The metrics that decide whether the architecture is working are first-class signals.
A reasonable starter loop for a customer service agent build:
- Instrument every LLM call with Respan tracing including retrieval, tool-call, and escalation spans.
- Pull 200 to 500 production conversations into a dataset and label them for resolution quality, hallucination, and escalation correctness.
- Wire two or three evaluators that catch the failure modes you most fear (refund-policy fabrication, missed escalation, agent drift).
- Put your policy and escalation prompts behind the registry so you can version, A/B, and roll back without a deploy.
- Route through the gateway so per-customer spending caps and bounded-authority patterns are enforced at one place, not in every action handler.
That loop, running on real traffic, is the difference between Klarna's reversal and a system that holds up to a procurement security review.
To wire any of the patterns above on Respan, start tracing for free, read the docs, or talk to us.
Related reading
- Evaluating Customer Service LLMs: four-dimension eval framework
- Building a Customer Service Agent: full architecture walkthrough
- How Customer Support Teams Build LLM Apps in 2026: pillar overview
