Cloud VM Benchmarks 2026: How to Choose the Right Infrastructure for Your SaaS

cloudinfrastructureAWSGCPbenchmarksDevOps
Kirill Latish
Kirill Latish
LinkedIn
Share

Choosing the right cloud infrastructure for your SaaS in 2026 is complex. This guide offers real benchmarks and insights to help you make informed decisions.

You have three browser tabs open. AWS pricing calculator in one, GCP in another, Azure in the third. The monthly estimates look surprisingly similar, and they're all wrong in different ways. The calculators don't account for ARM price-performance gaps, I/O throttling on smaller instances, or the fact that a "4 vCPU" means something measurably different on each provider.

Picking cloud infrastructure for a SaaS product in 2026 is harder than it should be. Marketing pages promise "best price-performance" without defining either term. Benchmark blogs from 2023 still rank on Google but reference instance types that no longer exist. Meanwhile, ARM processors quietly became the sensible default, GPU availability turned into a supply chain headache, and providers like Hetzner made the hyperscalers look expensive enough that people started doing the math.

So here's a proper breakdown: real cloud VM benchmarks from 2026, price-performance compared across workload types, and a decision framework for picking instances without spending a week on it. Specific instance families, real numbers, recommendations for SaaS teams running everything from Laravel APIs to GPU inference pipelines.

The 2026 Cloud Landscape — What's Changed

Two years ago, telling someone to default to ARM instances would get you a "yeah, maybe for stateless stuff." That hedge is gone. AWS Graviton4 now powers the R8g family with roughly 30% better compute performance over Graviton3. Google shipped the Axion processor, their first custom ARM chip for Cloud, and it benchmarks at up to 50% better performance and 60% better energy efficiency than comparable x86 instances. Azure's Cobalt 100 series (ARM Neoverse N2) rounds out the trio.

Every major cloud provider now treats ARM as the go-to for general-purpose compute. Not as an experiment, not as a cost-saving afterthought. The price-performance gap over x86 has widened enough that sticking with Intel or AMD for CPU-bound workloads requires a specific justification, not the other way around.

GPU availability changed the architecture conversation

Then there's the GPU situation. AWS P5 instances (NVIDIA H100), GCP A3 and A3 Ultra (H200), Azure ND H100 v5: all capacity-constrained in popular regions. If you're building an AI-heavy SaaS product, GPU procurement is now a supply chain problem. Teams that used to spin up GPU instances on demand are booking reserved capacity months ahead. Some are even diversifying across providers just to guarantee access, which would have seemed absurd three years ago.

Regional pricing wars and alternative providers

Here's the number that keeps coming up in infrastructure Slack channels: a Hetzner CAX31 (8 ARM vCPUs, 16 GB RAM) costs roughly €12.49/month. A comparable AWS c7g.2xlarge? About $97/month. That's a 7x price gap. Yes, you lose managed databases, IAM, and the broader ecosystem. But for workloads that don't need those things? Seven times is a lot.

The hyperscalers responded with more aggressive reserved pricing and expanded regions (AWS is at 36 regions now, GCP 41, Azure 63). Still, the unit economics gap is real for compute-heavy workloads that don't lean on platform integration. OVH and Vultr carved out similar niches. The era of "just use AWS for everything" is over for cost-conscious teams.

Provider

ARM Instance Family

x86 Instance Family

GPU Instance Family

Regions

AWS

R8g / C7g / M7g (Graviton4/3)

C7i / M7i (Intel), C7a / M7a (AMD)

P5 (H100), P4d (A100), Inf2 (Inferentia2)

36

GCP

C4A (Axion), N4A (Axion N3)

C4 (Intel), C4D (AMD Turin)

A3 (H100), A3 Ultra (H200), G2 (L4)

41

Azure

Dpsv6 / Epsv6 (Cobalt 100)

Dv6 (Intel), Dasv6 (AMD)

ND H100 v5, NC A100 v4

63

Hetzner

CAX (Ampere Altra)

CX (Intel/AMD shared), CCX (dedicated)

N/A

5

Benchmark Results by Workload Type

Aggregate benchmarks make great blog post titles and terrible purchasing decisions. A VM that tops Geekbench might choke on database I/O. The ecuadors.net cloud benchmark study (which hit 274 points on Hacker News) got attention precisely because it tested real workload profiles instead of synthetic scores. The results broke a few assumptions people had been carrying around.

Let's go workload by workload.

Web API servers (Laravel, Node.js, Django)

These are CPU-bound with memory pressure from connection pooling. The benchmarks ran identical application code across providers, measuring raw requests per second. Graviton4 on a c7g.xlarge pushed 23% higher throughput than the equivalent Intel c7i.xlarge, at 15% lower hourly cost. GCP's C4A running on Axion was even more impressive: 28% more requests per second versus N2 instances, with per-request costs dropping 31% thanks to both the performance jump and lower pricing.

What does that look like in dollars? A Laravel API handling 2,000-5,000 requests per second goes from roughly $380/month to $260 on AWS by switching to ARM. On GCP, the move drops you from about $350 to $240. No code changes required beyond recompiling native dependencies.

Database workloads

Different story here. I/O latency matters far more than raw CPU. The benchmarks ran PostgreSQL pgbench and MySQL sysbench, and GCP's C4D instances (AMD Turin) came out on top for throughput: 55% more queries per second on MySQL, 35% higher ops/sec on Redis versus previous-gen C3D. That 4.1 GHz boost clock and improved IPC really show up in query-heavy scenarios. On the AWS side, R8g instances (Graviton4, memory-optimized) delivered a 22% PostgreSQL TPS improvement over R7g, especially on write-heavy workloads where DDR5 bandwidth pulls its weight.

But here's what surprised people: network-attached storage is still the bottleneck for latency-sensitive queries, even in 2026. Instances with local NVMe (GCP C4D-lssd or AWS i4i) cut p99 query latency by 40-65% compared to remote block storage. You pay more and accept durability trade-offs, but if your users feel that tail latency, local disks are worth every cent.

AI inference

GPU benchmarks split cleanly by model size, and the winner changes depending on which side of that split you're on. For models under 7B parameters, AWS Inferentia2 (Inf2) instances had the lowest cost per 1,000 inferences at roughly $0.012, compared to $0.045 on NVIDIA A10G (G5) instances. That's nearly 4x cheaper for the same work.

Larger models flip the picture. For 13B-70B parameter models, H100-based instances (P5 on AWS, A3 on GCP) achieved 3.2x higher throughput despite costing 4x more per hour. Better GPU utilization at scale actually makes the expensive hardware cheaper per inference. Cold start times are the other gotcha: GCP A3 instances spun up in 45-90 seconds, while Azure ND H100 v5 took 2-4 minutes in constrained regions. If you're serving a real-time inference API, pre-warmed reserved capacity isn't optional.

Background job processing

Video transcoding, PDF generation, data pipelines. These workloads run hot for hours and care about one thing: cost per compute-hour. ARM dominates. Hetzner's CAX41 (16 ARM vCPUs, 32 GB) at €24.49/month delivers roughly 4-5x more compute per dollar than equivalent AWS or GCP instances for sustained work.

More and more teams run a split setup: Kubernetes on Hetzner handles batch processing, a hyperscaler handles the latency-sensitive API layer. It's more operational overhead, sure, but when your background processing bill drops by 75%, that overhead starts looking reasonable.

Workload Type

Best Price-Performance

Best Absolute Performance

Cost per Unit (Relative)

Web API (req/s per $)

GCP C4A (Axion)

GCP C4D (AMD Turin)

C4A: 1.0x / C4D: 1.3x / AWS c7g: 1.1x

Database (QPS per $)

AWS R8g (Graviton4)

GCP C4D-lssd (local NVMe)

R8g: 1.0x / C4D: 1.15x / Azure Eps: 1.2x

AI Inference (<7B)

AWS Inf2 (Inferentia2)

GCP A3 (H100)

Inf2: 1.0x / G5: 3.8x / A3: 2.1x

AI Inference (13B+)

GCP A3 (H100)

AWS P5 (H100)

A3: 1.0x / P5: 1.1x / ND H100: 1.25x

Background Jobs (CPU-hr/$)

Hetzner CAX (ARM)

GCP C4D-384

Hetzner: 1.0x / AWS c7g: 4.2x / GCP C4A: 3.8x

Cost Optimization Strategies That Actually Work

Knowing which instances to pick is half the job. How you pay for them is the other half, and most SaaS teams leave 40-70% on the table. That's not a rounding error. On a $15k/month compute bill, we're talking about enough savings to hire another engineer.

Spot and preemptible instances for batch workloads

Spot instances get you 60-90% off, but AWS can pull the rug with two minutes' notice. For batch AI inference, video processing, data pipelines? That trade-off is easy. Design your jobs to be idempotent. If a Spot instance vanishes mid-task, the job restarts on a new instance, no data lost. AWS Batch, GCP Cloud Run Jobs, or a self-managed BullMQ queue all handle this well.

Worth knowing: Graviton4 Spot prices have been surprisingly stable in 2026. Interruption rates for c7g and m7g families in US regions stayed below 5%. GCP Spot VMs on C4A run at a 60-70% discount with similar reliability. These aren't the lottery tickets they used to be.

Reserved instances vs. savings plans

Reserved Instances lock you into a specific instance type for 1-3 years. Savings Plans (AWS) and Committed Use Discounts (GCP, Azure) do something smarter: you commit to a dollar-per-hour spend instead of a specific machine. Both get you 30-60% off. Which one? If you're confident your instance family won't change, RIs give slightly better discounts. If you might migrate from x86 to ARM next year (and you probably should), Savings Plans give you room to move.

The sweet spot for most SaaS products: commit to 60-70% of your steady-state compute through a Savings Plan, cover the rest with on-demand or Spot. A mid-stage SaaS spending $15,000/month on compute typically drops to $8,500-9,500/month with this approach.

Right-sizing: the boring one that saves the most

Nobody wants to talk about right-sizing. It's not glamorous. But 40-60% of cloud instances are over-provisioned, according to basically every industry report published in the last three years. That 4-vCPU instance humming along at 12% CPU utilization? You're paying for a sports car and driving it to the corner store.

AWS Compute Optimizer and GCP's Active Assist both flag these. Kubecost does it for Kubernetes clusters. Check the recommendations once a quarter, resize, move on. The compound savings are significant.

A real example: a team running Laravel + Next.js had three c5.2xlarge instances at $490/month each. After profiling showed 65% idle CPU, they switched to two c7g.xlarge Graviton instances at $172/month each. Monthly compute went from $1,470 to $344. That's a 77% cut. And p95 response times actually got 11% faster because Graviton's single-threaded performance is genuinely good. This kind of architecture-aware optimization is exactly what you'd dig into in a highload software architecture context.

Multi-cloud with Kubernetes

Running Kubernetes across multiple providers sounds like a recipe for sleepless nights. For most teams, it is. But above $10,000/month in compute, the pattern starts making financial sense: keep your primary workloads (and data) on one provider, burst batch processing to the cheapest option. Hetzner for background jobs, GCP when you need GPUs, AWS for the main API and managed services. Terraform and Crossplane make multi-cluster provisioning repeatable enough that it's not a heroic effort anymore.

A Decision Framework for Your Stack

All of this data is useless if you can't turn it into an instance type in your Terraform config. The flowchart below maps common SaaS workload profiles to specific recommendations. It won't replace benchmarking your actual app, but it'll get you 80% of the way there in five minutes.

Mermaid diagram is empty

Recommendations for a Laravel + Next.js SaaS

Since this is probably the most common stack people reading this article are running, let's get specific. Laravel API: start with AWS c7g.medium (1 vCPU ARM, 4 GB) or GCP c4a-standard-2. That's $30-45/month per instance. Scale horizontally behind an ALB or Cloud Load Balancer. For the Next.js frontend with SSR, similar sizing works, though if your rendering is mostly stateless, Vercel or Cloudflare Pages will cost less than VMs at moderate traffic levels.

PostgreSQL goes on a memory-optimized instance: AWS R8g.large or GCP n4a-highmem-4. Use managed services (RDS, Cloud SQL) unless you have a dedicated DBA. That 30-40% markup over raw VMs pays for itself in ops hours before you finish your first on-call rotation. Redis for caching and queues: a small managed instance (ElastiCache, Memorystore) covers most SaaS workloads unless you're pushing thousands of jobs per second.

GPU instances for AI-heavy workloads

Most SaaS products ship AI features now. If yours does, GPU instance selection deserves its own planning session. Real-time inference APIs (chatbots, image generation, embeddings) should optimize for latency: GCP G2 with NVIDIA L4 gives the best cost-per-inference for sub-7B models. Batch workloads (fine-tuning, bulk embedding generation) want throughput: H100 instances on Spot pricing. If you're still figuring out where AI fits in your product roadmap versus just the infrastructure, the AI Product Manager course offers a useful framework for scoping those decisions before you commit to GPU spend.

When to use managed services vs. raw VMs

Raw VMs make sense for custom kernel tuning, specific hardware requirements (local NVMe, GPUs), or teams with solid ops chops. For everyone else, managed services win. ECS/Fargate, Cloud Run, App Engine: they cost 20-40% more per compute-hour, but you get automated scaling, patching, and monitoring baked in. For teams under 10 engineers, the engineering time you save by not managing infrastructure almost always exceeds the price premium. Almost always.

Conclusion

Provider loyalty is an expensive habit. The benchmarks show clear winners by workload type, and they aren't the same provider every time. ARM instances (Graviton4, Axion, Cobalt 100) should be your starting point for CPU-bound work, with 20-30% better price-performance than x86 across the board. GCP leads on raw web serving throughput with C4D. AWS has the deepest managed services ecosystem and Inferentia wins for small-model inference. Hetzner demolishes everyone on raw compute cost when you don't need the platform integrations.

None of these numbers replace testing with your actual workload. pgbench is not your app. Run your real code on two or three candidate instance types for a week. Measure p50, p95, p99. Calculate cost per request and cost per transaction, because cost per hour is a vanity metric for SaaS infrastructure.

If you take one thing from this article: start with ARM, right-size ruthlessly, use Spot for batch work, and commit to 60-70% of your base load. That alone puts you ahead of most teams still running on whatever instance type someone picked during the initial launch scramble. For the architectural thinking behind these choices, the Highload Software Architecture course goes deep on building systems that don't fall over when traffic spikes.

Kirill Latish
Kirill Latish
LinkedIn
Share