Skip to content

Commit 27a6b73

Browse files
Yolean macbot01claude
authored andcommitted
Update experiment report with S3 storage cost comparison
Both backends now write to versitygw. GreptimeDB's columnar format produces 5.6x less data (252 KB vs 1.4 MB) for the same metrics workload. This flips the storage cost score and brings the weighted totals to a near-tie (Thanos 8.05 vs GreptimeDB 8.30). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 144cbde commit 27a6b73

File tree

1 file changed

+58
-21
lines changed

1 file changed

+58
-21
lines changed

tmp-migration-plans/metrics-v2-experiment-results.md

Lines changed: 58 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,17 @@ converge script was updated to reflect the new structure.
4444
### 5. Blob store: versitygw (not minio)
4545

4646
The plan referenced minio in some contexts. The codebase has already migrated to
47-
versitygw. No changes needed for the experiment itself — Thanos Receive and GreptimeDB
48-
both use emptyDir, not S3 object storage. Any future production deployment that uses
49-
object storage for long-term retention must target the versitygw S3 API, not minio.
47+
versitygw. Both backends were reconfigured to write to versitygw S3 storage
48+
(`blobs-versitygw.ystack.svc.cluster.local`) for storage cost comparison. Bucket-create
49+
jobs provision `thanos-receive` and `greptimedb` buckets using the same minio/mc
50+
pattern as the registry.
51+
52+
### 8. Thanos 5m block duration override
53+
54+
To make Thanos upload blocks to object storage quickly enough for experiment
55+
observation, `--tsdb.min-block-duration=5m` and `--tsdb.max-block-duration=5m` were
56+
added. The default 2h block duration would mean no S3 uploads during a short
57+
experiment window. This override must NOT be used in production.
5058

5159
### 6. configmap-reload sidecar added
5260

@@ -157,6 +165,30 @@ in a single process, which explains the higher baseline.
157165

158166
---
159167

168+
## Object storage comparison
169+
170+
Both backends configured to write to versitygw S3 buckets. Measured after ~17 minutes
171+
of dual remote_write with Thanos block duration forced to 5m.
172+
173+
| Backend | Bucket size | Object count | Write pattern |
174+
|---------|------------|-------------|---------------|
175+
| Thanos Receive | 1.4 MB | 9 files (3 blocks) | Block-based: uploads ~3 files per 5m block (meta.json, index, chunks) |
176+
| GreptimeDB | 252 KB | 11 files | Columnar: writes smaller objects more frequently |
177+
178+
GreptimeDB stores **5.6x less data** on object storage for the same metrics workload.
179+
Its columnar format compresses significantly better than Thanos's TSDB block format.
180+
181+
**Caveats:**
182+
- Thanos block duration was artificially reduced from 2h to 5m. With default settings,
183+
Thanos would batch more data per block, potentially improving compression ratio.
184+
- Thanos Compactor (not deployed in this experiment) further reduces long-term storage
185+
by merging and downsampling blocks.
186+
- GreptimeDB's compaction behavior over longer time windows was not tested.
187+
- 17 minutes of data is too short for definitive storage cost projections — a multi-day
188+
test would be more representative.
189+
190+
---
191+
160192
## Evaluation scores
161193

162194
Using the criteria from the Mimir replacement research.
@@ -201,38 +233,43 @@ Thanos is significantly lighter. For a local dev cluster this matters.
201233

202234
| Backend | Score | Notes |
203235
|---------|-------|-------|
204-
| Thanos | 8/10 | Uses S3-compatible object storage (versitygw). Well-understood cost model. Compactor reduces storage. |
205-
| GreptimeDB | 7/10 | Also supports S3-compatible storage. Uses columnar format which should compress well. Less proven at scale. |
236+
| Thanos | 6/10 | 1.4 MB for ~17 min of data. Block-based format is less space-efficient. Compactor helps long-term but adds operational complexity. |
237+
| GreptimeDB | 9/10 | 252 KB for same data — 5.6x smaller. Columnar format compresses metrics data very well. Fewer bytes = lower S3 storage and egress cost. |
206238

207-
Both can target versitygw for object storage. Thanos has a more mature compaction
208-
story.
239+
GreptimeDB's columnar storage format produces significantly smaller objects. Both
240+
backends target versitygw S3. While Thanos Compactor can reduce long-term storage,
241+
GreptimeDB's baseline efficiency is notably better.
209242

210243
### Weighted total
211244

212245
| Backend | Correctness (20%) | Complexity (40%) | Resources (15%) | Maturity (10%) | Storage (15%) | **Total** |
213246
|---------|-------------------|-----------------|-----------------|---------------|--------------|-----------|
214-
| Thanos | 2.0 | 2.8 | 1.35 | 1.0 | 1.2 | **8.35** |
215-
| GreptimeDB | 2.0 | 3.6 | 0.75 | 0.6 | 1.05 | **8.00** |
247+
| Thanos | 2.0 | 2.8 | 1.35 | 1.0 | 0.9 | **8.05** |
248+
| GreptimeDB | 2.0 | 3.6 | 0.75 | 0.6 | 1.35 | **8.30** |
216249

217250
---
218251

219252
## Recommendation
220253

221-
**Thanos wins narrowly (8.35 vs 8.00)**, primarily due to its lower resource footprint
222-
and maturity. However, the scores are close enough that the decision should also
223-
consider:
254+
**The two backends are essentially tied (Thanos 8.05 vs GreptimeDB 8.30)** after
255+
accounting for measured object storage efficiency. GreptimeDB's columnar format
256+
produces 5.6x less data on S3, which flips the storage cost score and narrows
257+
Thanos's advantage on maturity and resource usage.
258+
259+
1. **For ystack local dev clusters**: Thanos is still preferred — lighter CPU/memory
260+
footprint matters in constrained k3d environments, and storage cost is less
261+
relevant with emptyDir/local volumes.
224262

225-
1. **For ystack local dev clusters**: Thanos is preferred — lighter resource usage
226-
matters in constrained k3d environments, and the 2-component topology (Receive +
227-
Query) is manageable.
263+
2. **For production multi-cluster with S3 storage costs**: GreptimeDB deserves
264+
serious consideration — its storage efficiency advantage compounds at scale,
265+
and lower object counts mean fewer S3 API calls (PUT/GET costs).
228266

229-
2. **For production multi-cluster**: Thanos is preferred — the Receive component
230-
already supports multi-tenancy via labels, and the Query component can federate
231-
across multiple Receive instances. Zone-aware ingestion is well-documented.
267+
3. **Thanos advantages**: CNCF graduated maturity, battle-tested at massive scale,
268+
well-documented multi-tenancy and zone-aware ingestion, lower runtime resource
269+
footprint.
232270

233-
3. **GreptimeDB remains interesting** for use cases that need SQL access to metrics
234-
data or where the standalone deployment model is valued. It could be revisited in
235-
a future evaluation as the project matures.
271+
4. **GreptimeDB advantages**: Simpler single-component topology, dramatically better
272+
storage efficiency, SQL access to metrics data, active development pace.
236273

237274
## Next steps
238275

0 commit comments

Comments
 (0)