@@ -44,9 +44,17 @@ converge script was updated to reflect the new structure.
4444# ## 5. Blob store: versitygw (not minio)
4545
4646The plan referenced minio in some contexts. The codebase has already migrated to
47- versitygw. No changes needed for the experiment itself — Thanos Receive and GreptimeDB
48- both use emptyDir, not S3 object storage. Any future production deployment that uses
49- object storage for long-term retention must target the versitygw S3 API, not minio.
47+ versitygw. Both backends were reconfigured to write to versitygw S3 storage
48+ (`blobs-versitygw.ystack.svc.cluster.local`) for storage cost comparison. Bucket-create
49+ jobs provision `thanos-receive` and `greptimedb` buckets using the same minio/mc
50+ pattern as the registry.
51+
52+ # ## 8. Thanos 5m block duration override
53+
54+ To make Thanos upload blocks to object storage quickly enough for experiment
55+ observation, `--tsdb.min-block-duration=5m` and `--tsdb.max-block-duration=5m` were
56+ added. The default 2h block duration would mean no S3 uploads during a short
57+ experiment window. This override must NOT be used in production.
5058
5159# ## 6. configmap-reload sidecar added
5260
@@ -157,6 +165,30 @@ in a single process, which explains the higher baseline.
157165
158166---
159167
168+ # # Object storage comparison
169+
170+ Both backends configured to write to versitygw S3 buckets. Measured after ~17 minutes
171+ of dual remote_write with Thanos block duration forced to 5m.
172+
173+ | Backend | Bucket size | Object count | Write pattern |
174+ |---------|------------|-------------|---------------|
175+ | Thanos Receive | 1.4 MB | 9 files (3 blocks) | Block-based : uploads ~3 files per 5m block (meta.json, index, chunks) |
176+ | GreptimeDB | 252 KB | 11 files | Columnar : writes smaller objects more frequently |
177+
178+ GreptimeDB stores **5.6x less data** on object storage for the same metrics workload.
179+ Its columnar format compresses significantly better than Thanos's TSDB block format.
180+
181+ **Caveats:**
182+ - Thanos block duration was artificially reduced from 2h to 5m. With default settings,
183+ Thanos would batch more data per block, potentially improving compression ratio.
184+ - Thanos Compactor (not deployed in this experiment) further reduces long-term storage
185+ by merging and downsampling blocks.
186+ - GreptimeDB's compaction behavior over longer time windows was not tested.
187+ - 17 minutes of data is too short for definitive storage cost projections — a multi-day
188+ test would be more representative.
189+
190+ ---
191+
160192# # Evaluation scores
161193
162194Using the criteria from the Mimir replacement research.
@@ -201,38 +233,43 @@ Thanos is significantly lighter. For a local dev cluster this matters.
201233
202234| Backend | Score | Notes |
203235|---------|-------|-------|
204- | Thanos | 8 /10 | Uses S3-compatible object storage (versitygw). Well-understood cost model . Compactor reduces storage . |
205- | GreptimeDB | 7 /10 | Also supports S3-compatible storage. Uses columnar format which should compress well. Less proven at scale . |
236+ | Thanos | 6 /10 | 1.4 MB for ~17 min of data. Block-based format is less space-efficient . Compactor helps long-term but adds operational complexity . |
237+ | GreptimeDB | 9 /10 | 252 KB for same data — 5.6x smaller. Columnar format compresses metrics data very well. Fewer bytes = lower S3 storage and egress cost . |
206238
207- Both can target versitygw for object storage. Thanos has a more mature compaction
208- story.
239+ GreptimeDB's columnar storage format produces significantly smaller objects. Both
240+ backends target versitygw S3. While Thanos Compactor can reduce long-term storage,
241+ GreptimeDB's baseline efficiency is notably better.
209242
210243# ## Weighted total
211244
212245| Backend | Correctness (20%) | Complexity (40%) | Resources (15%) | Maturity (10%) | Storage (15%) | **Total** |
213246|---------|-------------------|-----------------|-----------------|---------------|--------------|-----------|
214- | Thanos | 2.0 | 2.8 | 1.35 | 1.0 | 1.2 | **8.35 ** |
215- | GreptimeDB | 2.0 | 3.6 | 0.75 | 0.6 | 1.05 | **8.00 ** |
247+ | Thanos | 2.0 | 2.8 | 1.35 | 1.0 | 0.9 | **8.05 ** |
248+ | GreptimeDB | 2.0 | 3.6 | 0.75 | 0.6 | 1.35 | **8.30 ** |
216249
217250---
218251
219252# # Recommendation
220253
221- **Thanos wins narrowly (8.35 vs 8.00)**, primarily due to its lower resource footprint
222- and maturity. However, the scores are close enough that the decision should also
223- consider :
254+ **The two backends are essentially tied (Thanos 8.05 vs GreptimeDB 8.30)** after
255+ accounting for measured object storage efficiency. GreptimeDB's columnar format
256+ produces 5.6x less data on S3, which flips the storage cost score and narrows
257+ Thanos's advantage on maturity and resource usage.
258+
259+ 1. **For ystack local dev clusters** : Thanos is still preferred — lighter CPU/memory
260+ footprint matters in constrained k3d environments, and storage cost is less
261+ relevant with emptyDir/local volumes.
224262
225- 1 . **For ystack local dev clusters ** : Thanos is preferred — lighter resource usage
226- matters in constrained k3d environments, and the 2-component topology (Receive +
227- Query) is manageable .
263+ 2 . **For production multi-cluster with S3 storage costs ** : GreptimeDB deserves
264+ serious consideration — its storage efficiency advantage compounds at scale,
265+ and lower object counts mean fewer S3 API calls (PUT/GET costs) .
228266
229- 2 . **For production multi-cluster ** : Thanos is preferred — the Receive component
230- already supports multi-tenancy via labels, and the Query component can federate
231- across multiple Receive instances. Zone-aware ingestion is well-documented .
267+ 3 . **Thanos advantages ** : CNCF graduated maturity, battle-tested at massive scale,
268+ well-documented multi-tenancy and zone-aware ingestion, lower runtime resource
269+ footprint .
232270
233- 3. **GreptimeDB remains interesting** for use cases that need SQL access to metrics
234- data or where the standalone deployment model is valued. It could be revisited in
235- a future evaluation as the project matures.
271+ 4. **GreptimeDB advantages** : Simpler single-component topology, dramatically better
272+ storage efficiency, SQL access to metrics data, active development pace.
236273
237274# # Next steps
238275
0 commit comments