An Open-Source Benchmark Now Compares 4 FHIR Servers Side by Side

How often does anyone actually sit four FHIR servers next to each other on the same hardware and run the same load against all of them? Not often, and the gap has been visible in every procurement conversation for years. That gap got smaller on June 29, when Health Samurai published a public performance benchmark covering four servers, with the test harness in the open and the dashboard rerunning daily.

The team behind the benchmark, led by Marat Surmashev, VP of Engineering at Health Samurai, put HAPI FHIR, Medplum, the Microsoft FHIR Server, and Aidbox on one bare-metal machine. Honest disclosure up front: Health Samurai builds Aidbox, so this is a vendor-run benchmark, not an independent audit. What makes it usable is that the repo is open and the daily rerun makes any number you see today checkable tomorrow.

What the Setup Looks Like

Each server got the same slice of the box: 8 vCPU and 24 GB of RAM, with Medplum running as 8 single-core replicas to match its native scaling pattern. The data was Synthea, 1,000 synthetic patients, around 2 million resources after import. Load was driven by Grafana k6, and the harness runs daily so the snapshot you read this week is not stale next week.

The hardware is one server with 64 cores and 500 GB of RAM. PostgreSQL 18 sits under Aidbox, HAPI, and Medplum. The Microsoft FHIR Server runs against SQL Server 2022 Developer Edition.

CRUD Throughput

The CRUD test exercises create, read, update, and delete on nine resource types under sustained load. The numbers from the June 29 run:

Aidbox: about 5,212 RPS
HAPI FHIR: about 3,058 RPS
Medplum: about 1,420 RPS
Microsoft FHIR Server: about 440 RPS

The spread is over 11x between top and bottom. For the FHIR fundamentals hub and the broader integration context, this kind of head-to-head is useful even before any team plugs the numbers into a sizing exercise.

Bundle Import

Bundle import is what migration projects actually feel. The benchmark reports resources per second during a sustained Synthea import:

Aidbox: about 2,678 res/sec
HAPI FHIR: about 2,214 res/sec
Medplum: about 764 res/sec
Microsoft FHIR Server: about 448 res/sec

HAPI sits closer to the top here than it does on CRUD, which is consistent with how the project has invested in batch ingestion. If you are moving Synthea-shaped data into a fresh FHIR backend, the import number is the one to watch first.

Storage Footprint After the Same Load

After all four servers load the same ~2 million resources, the disk usage is genuinely different.

Microsoft FHIR Server: 4.24 GB
Aidbox: 6.83 GB
Medplum: 11.8 GB
HAPI FHIR: 22.6 GB

The note in the benchmark explains the spread. HAPI, Medplum, and Microsoft prebuild search indexes on write, so the footprint includes those indexes by default. Aidbox ships without default search indexes, which makes import faster and storage smaller, with the trade-off that the operator decides which indexes to create. There is no free lunch hiding in the disk numbers; the indexing strategy lives there too.

Why a Daily Public Rerun Matters

Most vendor benchmarks land as a PDF with a publication date and no path to verify. A repo that reruns every day and posts the snapshot to a public dashboard turns that posture inside out. If a vendor pushes an optimization, the next day's run shows it. If a test is wrong, anyone can open an issue.

For teams picking a FHIR backend for a form-builder pipeline, the best FHIR form engines for EHR integration in 2026 walks through how the backend choice shapes the form runtime. Once a server choice is on the table, the complete guide to FHIR form builders in 2026 is the next stop.

— Rebecca Ostrowski

What the Setup Looks Like

CRUD Throughput

Bundle Import

Storage Footprint After the Same Load

Why a Daily Public Rerun Matters

Related Posts