Hermes
HomeFor AgenciesFor BusinessesFor CreatorsPricing
Apply for Beta · 40 spots
/ back to blog

/ cost transparency · operator playbook

Why Your $0.07/Min Voice Agent Actually Costs $0.31: The 5-Invoice Problem

By Alfredo Romero, CEO, Hermes·May 13, 2026·13 min read

If you priced a voice-AI retainer against the $0.05 line on Vapi's pricing page or the $0.07 line on Retell's, your real per-minute cost is somewhere between $0.13 and $0.33, and you are reconciling up to five separate invoices every month to find out what it actually was. The orchestration platform is one charge. Speech-to-text is another. The LLM is a third. Text-to-speech is a fourth. The carrier (Twilio) is a fifth. The headline rate covers only the first of the five. This post is the math, the named line items, the expert quotes from the analysts who measured it, and the audit checklist to run on your next provider invoice. The short version: the agencies bleeding $500 to $3,000 a month in margin are not the ones who priced wrong. They are the ones who priced against the headline.

This is what wrapper resellers, GoHighLevel agencies, and direct-to-Vapi builders all hit at the same place: somewhere around client number three to five, the gap between the headline rate and the invoice total grows large enough to notice, and by then the retainers are locked at a price that assumed the headline.

The number this post will help you calculate is the one your accountant should be calculating already: total monthly cost of voice infrastructure divided by total billable minutes. If the answer is more than 1.5x your headline rate, you are operating on the wrong side of the 5-invoice problem.

What is the 5-invoice problem, exactly?

A modern AI voice agent is not one product. It is five products stitched together at runtime by an orchestration layer. The orchestration layer is what platforms like Vapi, Retell, and Bland sell. The other four sit underneath and bill independently.

Per Trillet's 2026 voice AI white-label pricing breakdown, a single voice agent in production can generate "up to five separate invoices per deployment." That is not an outlier case. That is the default architecture once you pick a non-bundled platform.

The five invoices, in order of appearance on your card:

  1. Platform / orchestration. Vapi at $0.05/min, Retell at $0.07/min, Bland at $0.09/min. This is the line on the pricing page. It buys you the WebSocket, the call routing, the turn-taking logic, and the dashboard. It does not buy you a conversation.
  2. Speech-to-text (STT). Deepgram Nova-2 at roughly $0.0043/min for streaming transcription, per Retell's 2026 Vapi review. AssemblyAI and Whisper variants land in the same band.
  3. LLM tokens. GPT-4o averages $0.08 to $0.20/min on a typical conversation, per pxlpeak's Vapi pricing breakdown. GPT-4o mini, Claude Haiku, and Gemini Flash drop the line to under $0.05/min if your prompt is tight.
  4. Text-to-speech (TTS). ElevenLabs at $0.18 per 1,000 characters, which translates to $0.036 to $0.072/min for an average-verbosity turn. Budget voice catalogs (Deepgram Aura, Azure Neural) land at roughly $0.011/min.
  5. Telephony. Twilio programmable voice at $0.014/min inbound on US local numbers, plus carrier surcharges and the A2P 10DLC monthly fee that Twilio documents at $1.50 to $10/month per campaign.

Stack them. The cheapest viable production setup (GPT-4o mini, Deepgram Nova-2, ElevenLabs Turbo, Twilio) clears about $0.15/min. A premium stack (GPT-4o, Deepgram, ElevenLabs Multilingual) clears $0.35/min and up. Dograh's 2025 Vapi cost analysis concluded that all-in Vapi pricing "typically lands between $0.23 and $0.33 per minute when all components are included."

"Most teams discover the true per-minute cost only after their first invoice arrives." [Retell AI, 2026 Voice Agent Pricing Breakdown]

That is the entire problem in one sentence. The pricing page is not lying. It is selling you one of the five things you need to run a call.

How much margin is this actually costing my agency?

The clearest measurement I have seen on this question comes from the MSP buyer's guide published by Viirtue in March 2026. They audited reseller stacks across voice-AI providers and reported the compounding effect at scale.

"A 1.8% to 11.6% margin gap compounds fast. At 50 clients, agencies are looking at $500 to $3,000 per month in lost margin plus 5x more vendor management overhead." [Viirtue, 2026 MSP Buyer's Guide to AI Voice Billing]

Run the math against your own book. An agency at 10 clients with an average of 1,200 billable minutes per client per month moves 12,000 minutes through the stack. At a $0.07 headline rate, your cost-of-service line in the P&L reads $840. At the real $0.23/min all-in cost, the line is $2,760. That is $1,920 in margin you already promised your client.

Same math at 30 clients: $2,520 budgeted, $8,280 actual, $5,760 in monthly margin leakage. You are not running a $30K MRR agency. You are running a $24K MRR agency that thought it was a $30K MRR agency.

The vendor-management overhead in the Viirtue quote is the part most operators skip. Five invoices means five logins, five password rotations, five customer support escalation paths, five rate-card changes to keep up with, and five separate reconciliations every month before the agency can produce one clean bill for the client. The hours on this are real. Most agencies absorb them as founder time, which is the most expensive labor in the business.

Why does the headline rate not just include the rest?

Because the platforms that publish a $0.05 or $0.07 rate are orchestration providers, not voice-AI vendors. They route audio between four other vendors and take a margin on the routing. They cannot bundle the others without picking a provider for you, which would lose them the customers who want to bring their own LLM or voice catalog.

The clearest disclosure of this is on Vapi's own pricing page, which links to the official add-ons table and lists provider passthroughs explicitly. The fine print is honest. It just lives below the fold of the pricing page that every agency owner read once before signing up.

Retell took a different approach in 2026 and started publishing a "real cost" estimate of $0.13/min in their own marketing. That estimate is roughly accurate for their default stack, which is more bundling than Vapi but still expects the customer to add their own LLM key for anything above default.

GoHighLevel sells Voice AI as a sub-account add-on at $0.06/min for the voice engine and bills LLM tokens separately, which lands at roughly $0.163/min average per Sympana's 2026 GoHighLevel Voice AI cost breakdown. Rebilling-with-markup is only available on the $497/month SaaS Pro plan, per HighLevel's own pricing documentation.

What does the real invoice math look like across platforms?

This is the side-by-side I assemble from public pricing pages and cited third-party audits. Headline rate vs real all-in cost for a production-grade agent at 1,000 minutes a month.

PlatformHeadline rateReal all-in (per audits)Invoices to reconcile
Vapi$0.05/min$0.23 to $0.33/min5
Retell$0.07/min$0.13 to $0.18/min2 to 3
GoHighLevel Voice AI$0.06/min + LLM$0.163/min avg1 (sub-account)
Voicerr (post-hike)$199 to $299/mo + usage$0.21 to $0.28/min2 to 3
Synthflow Agency$3,400/mo + usage$0.19 to $0.24/min2
Hermes$149 to $699/mo · $0.24/min flat$0.24/min flat1

The "Real all-in" column is the average production cost per minute once STT, LLM, TTS, and telephony are layered in. Sources: Famulor's 2026 10-platform per-minute analysis, Gladly's total-cost-of-ownership breakdown, and the public pricing pages of each provider.

Why does this hit agencies harder than direct enterprise buyers?

Two structural reasons.

First, agencies sell retainers, not per-minute. The retainer pricing model assumes a stable cost-of-service line. The 5-invoice problem makes that line volatile. Two clients with the same retainer can have very different invoice profiles because their call lengths, model choice, voice catalog, and call concurrency are all different.

"If you only budget for the plan fee, you are probably underestimating your actual cost, and if you are an agency, the math changes fast." [Gladly, Hidden Costs of Voice AI for CX Leaders, 2026]

Second, agencies cannot pass through usage variance to the client the way an enterprise buyer can to its own budget owner. Once a retainer is quoted, every minute over plan eats margin. Most agencies on Vapi or Retell never built a usage-based markup clause into their MSA. They priced flat and absorbed the variance.

Read the second part as the harder problem. Variability is the real tax. Even if your average cost is fine, a bad month with a chatty new client can blow up an entire quarter's projected margin.

How do I audit my actual per-minute cost in 30 minutes?

This is the audit I run for every operator who applies to the Hermes Founders' Beta. You can run it on your own data without talking to me.

  1. Pull last month's invoices from every provider in the stack. Platform (Vapi/Retell/Bland), STT (Deepgram/AssemblyAI), LLM (OpenAI/Anthropic), TTS (ElevenLabs/Cartesia), telephony (Twilio/Telnyx). If you are on GoHighLevel, pull the sub-account usage report plus your Twilio sub-account bill.
  2. Add the totals. One number. This is total cost of voice infrastructure for the month.
  3. Pull total billable minutes processed. This is on the platform's analytics page. Make sure it matches what you billed clients for.
  4. Divide. The result is your real per-minute cost. Round to two decimal places. Write it down.
  5. Compare to your client retainer per-minute rate. If you charge a $1,500 flat retainer and the client used 1,200 minutes, your effective sell rate is $1.25/min. Subtract your real per-minute cost. That delta times monthly minutes is your gross margin per client.
  6. Repeat for every client. You will find a P20 client and a P80 client. The P20 is the margin leak. Either reprice that account or move them to a cheaper model.

If you want this run for you on a Loom, drop your invoices into VoiceBillAudit and I will return a side-by-side cost diff inside 48 hours.

What does the user voice say about this on Reddit and Skool?

The complaint pattern is consistent. On the Vapi community forum, the most-cited operator frustration in Q1 2026 was unexplained cost spikes across LLM and TTS lines that did not correlate with a volume increase. The thread "Vapi calls dropped without customer explicitly ending the call" surfaced repeatedly in the support channel and is still open on Trieve KB issues.

In a recent Reddit roundup of r/AI_Agents discussion threads, the dominant theme is "compute pricing, token burn, plan caps, and model arbitrage" eating operator budgets before real work starts. Cache misses and runaway token use have become trust issues for operators trying to predict next month's bill.

On Synthflow's Trustpilot, the recurring operator complaint is cancelled subscriptions still billing for 5+ months and "calls are glitchy and the support does not help." Reliability and billing transparency compound. A platform that is flaky and opaque is the worst of both worlds for a retainer business.

How does Hermes solve the 5-invoice problem?

One invoice. Flat per-minute rate. Included minutes by plan. No passthroughs.

Hermes runs the upstream relationship. We pay the STT vendor, the LLM vendor, the TTS vendor, and Twilio. You pay Hermes. We negotiate volume rates the average operator cannot reach alone, we cache aggressively at the prompt level, and we publish one line per minute. $149/mo Starter with 300 included minutes, $399/mo Business with 1,000, $699/mo Agency with 2,000. Overage is $0.24/min flat at every tier.

The 25% margin on overage is what funds the platform: the multi-tenant billing, the per-client number isolation, the white-label demo pages, the support team that picks up when your client's appointment-setter agent stops booking. We are honest about the margin because honest margins are the entire point of this category.

If you want to see exactly how the Hermes stack compares to your current setup, the side-by-sides live at Hermes vs Voicerr and Hermes vs Vapi + GHL stack. If you migrated off Voicerr after the May price hike, the 14-day migration playbook walks the cutover end to end.

What about the OpenAI Realtime-2 price drop? Does it fix this?

It moves the LLM line, not the other four. GPT-Realtime-2 cut inference cost from $32 to $0.40 per 1M tokens with caching, which collapses the LLM line item in the stack but leaves STT, TTS, telephony, and platform untouched. The 5-invoice problem is an architecture problem, not a model-cost problem. See our breakdown of what GPT-Realtime-2 actually changes for agency margins for the full re-pricing math.

The takeaway. Cheaper models help. They do not consolidate the vendor count. Until the stack collapses into one invoice, the 5-invoice problem stays.

Frequently asked questions

Why does my Vapi or Retell bill keep climbing past the headline rate?

The headline rate ($0.05 on Vapi, $0.07 on Retell) covers only the orchestration layer. Production calls also incur Speech-to-Text (around $0.0043/min for Deepgram Nova-2), LLM tokens ($0.05 to $0.20/min depending on model), Text-to-Speech ($0.011 to $0.072/min depending on voice quality), and telephony ($0.014/min on Twilio). The stack totals $0.13 to $0.33 per minute in real-world deployments.

What are the 5 invoices in the 5-invoice problem?

Platform (Vapi/Retell), Speech-to-Text provider (Deepgram, AssemblyAI), LLM provider (OpenAI, Anthropic), Text-to-Speech provider (ElevenLabs, Cartesia), and Telephony carrier (Twilio, Telnyx). Agencies reselling under a single client retainer have to reconcile all five into one customer bill every month, manually.

How much margin is the stack actually costing agencies per month?

Viirtue's 2026 MSP buyer's guide measured a 1.8% to 11.6% margin gap across reseller stacks. Their conclusion: at 50 clients, agencies are looking at $500 to $3,000 per month in lost margin plus 5x more vendor-management overhead compared to a single-platform model.

Is GoHighLevel Voice AI cheaper than running a Vapi or Retell stack?

Cheaper at the unit level, but only if you stay inside the $497/mo SaaS Pro plan. GHL Voice AI runs $0.06/min for the voice engine plus LLM tokens, landing at roughly $0.163/min average per Sympana's 2026 breakdown. You can rebill with markup, but only on the $497 tier, and only to your own sub-accounts. Cancellation, white-label limits, and call-quality issues are widely documented in GHL agency communities.

Why does Hermes charge $0.24/min flat when the underlying cost is lower?

$0.24/min covers our actual upstream cost (around $0.18/min for a tuned production stack with caching) plus a 25% margin. That margin is what funds the platform, the white-label demo pages, the multi-tenant billing, and the support team. We do not charge platform fees on top, we do not bill STT and TTS separately, and the rate is locked for Founders' Beta operators for the life of the account.

How do I audit my current voice-AI invoices to find the real per-minute cost?

Pull last month's invoice from every provider in your stack: platform, STT, LLM, TTS, telephony. Add the total. Divide by total billable minutes. If the result is higher than the headline rate you priced your retainer against, you have a margin leak. Most agencies discover the leak is between 2x and 5x the headline rate.

Where this leaves you

The $0.05 line on a Vapi pricing page is not a lie. It is one fifth of the bill. The agencies that survive 2026 are the ones who priced against the real all-in number, not the headline. The fastest way to fix this in your own book is to run the audit above, find the leak per client, and either reprice the retainer or move the cost basis under one invoice.

By builders, for builders. The platform was built by operators who got the math wrong on the first three agencies they ran. Hermes is the math we wish we had on day one.

/ next step

Audit your real per-minute cost in under 48 hours

Drop last month's voice-AI invoices into the audit. We return a side-by-side cost diff with Hermes plus the leakiest client in your book. Free. No credit card.

Run the auditApply for the Founders' Beta

Alfredo Romero is CEO of Hermes, the voice infrastructure platform for AI agencies. Connect on LinkedIn.

Hermes

The operating platform for AI voice agencies. By builders, for builders.

Public launch · June 6, 2026

hello@buildwithhermes.com

Product

  • Founders' Beta
  • For Agencies
  • For Businesses
  • For Creators
  • Pricing
  • Integrations
  • Demo

Resources

  • Playbook
  • Stack guide
  • Pricing playbook
  • Blog
  • Manifesto

Compare

  • vs Synthflow
  • vs Vapi + GHL
  • vs Voicerr
  • vs DIY build

Company

  • About
  • Careers
  • Contact

Community

  • Discord
  • X (Twitter)
  • Instagram

Legal

  • Privacy
  • Terms
  • TCPA Compliance

© 2026 Hermes · All rights reserved

By builders, for builders · Last reviewed May 2026