A key-value HTTP service designed for per key atomicity, optimistic concurrency control and horizontal scaling.
The single node service is an HTTP server backed by an in-memory sharded concurrent hash map. The map distributes entries across multiple protected segments (shards), giving us two properties:
- Per-key serialization - writes to the same key acquire the shard's write lock via an entry API, so concurrent mutations on the same key are serialized.
- Cross-key parallelism — keys that hash to different shards proceed concurrently with no contention.
The shard count is configurable at startup (SHARD_AMOUNT env var) and must be a power of two.
All mutations go through a single mutate method that uses the map's entry API:
- Occupied entry — holds an exclusive reference to the entry. The version guard (
ifVersion) is checked while the lock is held, so there is no gap between reading the version and applying the write. - Vacant entry — inserts a new item at version 1. A vacant entry with a version guard is rejected (returns 409).
Each key carries a monotonically increasing integer version, starting at 1 on creation and incremented on every successful mutation. Clients can pass ?ifVersion=<n> on PUT or PATCH as an optimistic check:
- If the current version matches, the write proceeds and version increments.
- If it does not match (or the key doesn't exist for a versioned PUT), the server returns 409 Conflict with no mutation.
| Method | Path | Description |
|---|---|---|
GET |
/kv/{key} |
Returns { key, value, version } or 404 |
GET |
/kv |
Lists all keys (JSON array) |
PUT |
/kv/{key}?ifVersion=N |
Full replace. Optional optimistic lock via ifVersion query param |
PATCH |
/kv/{key}?ifVersion=N |
Upsert/merge. If both existing and incoming values are JSON objects, shallow merges top level fields. Otherwise replaces entirely. Optional optimistic lock via ifVersion query param |
| Existing value | Incoming delta | Result |
|---|---|---|
{"a":1} |
{"b":2} |
{"a":1,"b":2} (shallow merge) |
{"a":1} |
"hello" |
"hello" (type mismatch → replace) |
| Key absent | any value | Created with delta as value, version 1 |
- Shard level vs per key locks: Shard level locking is coarser than having a separate lock for each key, so two keys in the same shard can end up waiting on each other. The upside is that it avoids the memory cost of maintaining a lock per key. In this workload, that tradeoff is worth it, since a reasonable shard count keeps contention low.
Unit tests cover:
- Basic CRUD (get, set, merge on present/absent keys)
- Version guard happy path and stale-version rejection
- Merge semantics (object extend, type mismatch overwrite, vacant key insert)
- Concurrent sets on same key (100 tasks, assert version == 100)
- Concurrent sets on different keys (100 tasks, each key at version 1)
- Concurrent merges on same key (100 tasks, assert version == 101)
- Optimistic lock contention (10 concurrent CAS attempts, exactly 1 wins)
- Mixed concurrent reads and writes (50 writers + 50 readers, no torn reads)
Integration test:
- Spins up a real server on a random port.
- 3 HTTP clients each perform 100 read-modify-write increments on the same key using optimistic locking (
?ifVersion=N). On 409, the client retries from the read. - Asserts the final counter value is exactly 300, proving no lost updates.
# Start the server
ADDRESS=0.0.0.0:8080 SHARD_AMOUNT=64 cargo run -p kvstore
# Run tests
cargo test -p kvstoreA separate router process sits in front of N storage node processes. The router is a stateless HTTP reverse proxy that deterministically maps each key to exactly one backend node using hash-based partitioning (hash of the key mod number of nodes).
How routing works:
- For
GET/PUT/PATCH /kv/{key}: the router hashes the key, selects the owning node and proxies the request (including query params and body) to that node's/kv/{key}endpoint. - For
GET /kv(list all keys): the router fans out to every node's/kvendpoint in parallel, collects each node's key list and streams back NDJSON ({"key":"some-key","node":"node-id"}per line).
Why a router over client-side partitioning: a central router keeps the partitioning logic in one place — clients don't need to know about the topology. It also makes it easy to add the list-all-keys fan-out without client changes.
Since each key lives on exactly one node, per-key atomicity and versioning are preserved — the owning node handles all concurrency for its keys using the same shard-level entry locking from Part 1. The router is stateless and does no caching.
- Single point of failure: the router is a single point of failure.
- Static node list: nodes are configured at router startup. Adding/removing nodes would rehash keys and data would need to be migrated.
- Fan-out for list keys:
GET /kvmust query all nodes. Latency is bounded by the slowest node.
Integration tests spin up 3 kvstore nodes and a router in-process on random ports. They cover:
- Basic CRUD through the router (PUT, GET, overwrite)
- 404 for missing keys
- Optimistic locking on PUT (
ifVersionmatch → 200, mismatch → 409) - PATCH merge semantics (shallow object merge) and version guard (409 on mismatch)
- Fan-out
GET /kvreturns NDJSON with all keys across all nodes - Routing determinism (same key always hits the same node)
- Concurrent increments (3 clients × 50 read-modify-write loops with optimistic locking, final value = 150)
# Run router integration tests
cargo test -p kvstore_router
# Run all tests (kvstore + router)
cargo test# Terminal 1-3: start 3 storage nodes
ADDRESS=0.0.0.0:7001 SHARD_AMOUNT=64 cargo run -p kvstore
ADDRESS=0.0.0.0:7002 SHARD_AMOUNT=64 cargo run -p kvstore
ADDRESS=0.0.0.0:7003 SHARD_AMOUNT=64 cargo run -p kvstore
# Terminal 4: start the router
ADDRESS=0.0.0.0:7000 NODES=http://127.0.0.1:7001,http://127.0.0.1:7002,http://127.0.0.1:7003 cargo run -p kvstore_router# Start the full cluster (3 nodes + router)
docker compose up --build
# Stop the cluster
docker compose downPart 2 uses hash(key) % N to assign keys to nodes. This works with a fixed node count, but adding or removing a node basically requires a full data migration. In most production systems, we'd probably need the ability to scale the cluster up or down without major disruption.
With simple modulo hashing and 3 nodes, adding a 4th node changes the assignment of ~75% of keys:
key "user:42" → hash % 3 = Node 1 → hash % 4 = Node 2 (moved)
key "user:99" → hash % 3 = Node 0 → hash % 4 = Node 3 (moved)
key "user:7" → hash % 3 = Node 1 → hash % 4 = Node 1 (stayed)
Replace modulo hashing with consistent hash ring + virtual nodes.
- Each node gets a set of virtual nodes placed at hashed positions on a ring.
- To route a key, hash it and walk clockwise to the first virtual node, that virtual node's physical node handles the key.
- When a new node joins, it takes ownership of key ranges from its neighbors. Only keys in those ranges needs to be migrated.
- When a node leaves, its key ranges are absorbed by the next clockwise neighbors, this reduces the amount of data that needs to be migrated (compared to having to migrate all data using simple modulo hashing).
When the ring changes, the router needs to migrate affected keys:
- Router computes the old and new ring assignments.
- For each affected key range, the router reads keys from the old owner via
GET /kvand writes them to the new owner viaPUT /kv/{key}.
- Minimal disruption on scale events: adding a node moves only a fraction of the key.
- Uniform load distribution: virtual nodes smooth out imbalances from uneven hashing.
- No client changes: the router handles all ring logic internally.
- Lookup complexity: Slight lookup complexity where the lookup time is now O(log V) where V := number of virtual nodes
- Migration complexity: Key transfer during changes requires more coordination.
- Tuning complexity: Too few virtual nodes leads to uneven distribution and too many increases ring lookup overhead and memory.
- Hash ring implementation - Implement using sorted list of virtual node positions with binary search for key lookup.
- Router integration - Replace modulo routing with consistent hashing + virtual nodes.
- Data migration - Compute affected ranges, transfer keys between nodes.
- Tuning - Verify key distribution uniformity and use load tests to tune virtual node count .

