feat(cbrs): Time Window Routing #7418

volokluev · 2025-09-17T23:58:25Z

This PR implements a routing strategy for routing to TIER_1 and instead of downsampling storage tiers, shrinking time windows.

Design decisions

Window Sizing Algorithm

This is the easiest thing to change in this PR (and likely it will change). At the moment, it looks up the amount of outcomes for the requested time range and shrinks the time window down assuming that the distribution of datapoints is uniform across time (which is not true). Most recent datapoints are prioritized first

Pagination

Only the TraceItemTable endpoint makes use of the recommendations of this routing strategy, the endpoint and routing strategy interact across queries. This is to facilitate a simple client side UX where all the client has to do is pass the page_token across their requests and not worry about anything else.

Here's a diagram explaining the flow:

┌─────────────────┐              ┌────────────────┐                               
│                 │              │                │                               
│                 │              │                │                               
│                 │              │   Routing      │                               
│   Client        ├─────────────►│   Strategy     ┼───────────────────┐           
│                 │  page_token  │                │                   │           
│                 │              │                │                   │           
│                 │              │                │            Narrows│time window
└─────────────────┘              └────────────────┘            For Endpoint       
        ▲                                                             │           
        │                                                             │           
        │                         ┌────────────────┐                  │           
        │                         │                │                  │           
        │                         │                │                  │           
        │                         │ TraceItemTable │◄─────────────────┘           
        └─────────────────────────┼ Endpoint       │                              
             Encodes              │                │                              
             Time Window          └────────────────┘                              
             In Page Token

In order to facilitate pagination, the TraceItemTable endpoint now queries for limit + 1 rows in order to know if there are more items in this current window or if we can move on to the next one

What's missing

This functionality can be tested more rigorously, I deliberately did not spend too much time on it because I know it will change and the priority is to get something out there to try
We could probably have more observability into what the strategy is doing and understanding our success metrics better. These will be added as we understand the problem more

xurui-c · 2025-09-18T16:46:58Z

snuba/web/rpc/storage_routing/routing_strategies/outcomes_flex_time.py

+        routing_context.extra_info["estimation_sql"] = res.extra.get("sql", "")
+        return cast(int, res.result.get("data", [{}])[0].get("num_items", 0))
+
+    def _adjust_time_window(self, routing_context: RoutingContext) -> TimeWindow | None:


when does this function return None?

if there is no adjustment to be made

xurui-c · 2025-09-18T17:05:45Z

snuba/web/rpc/storage_routing/routing_strategies/outcomes_flex_time.py

+            window_length = original_end_ts - original_start_ts
+
+            start_timestamp_proto = TimestampProto(
+                seconds=original_end_ts - math.floor((window_length / factor))


Is this so we prioritize more recent data? and the user will paginate forwards?

xurui-c · 2025-09-18T17:08:02Z

snuba/web/rpc/storage_routing/routing_strategies/storage_routing.py

+    start_timestamp: TimestampProto
+    end_timestamp: TimestampProto
+
+    def length_hours(self) -> float:


where is this used?

debugging :D

volokluev · 2025-09-18T20:44:13Z

snuba/web/rpc/storage_routing/routing_strategies/outcomes_flex_time.py

+        ingested_items = self.get_ingested_items_for_timerange(
+            routing_context, original_time_window
+        )
+        factor = ingested_items / max_items


this is how we actually shrink the time window

volokluev · 2025-09-18T20:46:11Z

snuba/web/rpc/storage_routing/routing_strategies/outcomes_flex_time.py

+# TODO import these from sentry-relay
+class OutcomeCategory:
+    SPAN_INDEXED = 16
+    LOG_ITEM = 23


copy pasta, remove

volokluev · 2025-09-18T20:46:19Z

snuba/web/rpc/storage_routing/routing_strategies/outcomes_flex_time.py

+_ITEM_TYPE_TO_OUTCOME = {
+    TraceItemType.TRACE_ITEM_TYPE_SPAN: OutcomeCategory.SPAN_INDEXED,
+    TraceItemType.TRACE_ITEM_TYPE_LOG: OutcomeCategory.LOG_ITEM,
+}


copy pasted, remove this

volokluev · 2025-09-18T20:46:43Z

snuba/web/rpc/storage_routing/routing_strategies/outcomes_flex_time.py

+                    column("category"),
+                    _ITEM_TYPE_TO_OUTCOME.get(
+                        in_msg_meta.trace_item_type,
+                        OutcomeCategory.SPAN_INDEXED,


this is wrong, the default should not be span indexed

codecov · 2025-09-19T17:53:27Z

✅ All tests passed in 1293.61s

volokluev added 8 commits September 12, 2025 14:21

wip

4d1c7b0

running test that does ... something?

b2c3a9a

update routing strategy to use flextime

75169d2

scaffolding test and trace item table integration

6a9fd6c

pull out outcomes things

5ccf286

basic linear model

9960db2

prototype of an end to end test

9ddf45d

working test with pagination

db183df

volokluev requested review from a team as code owners September 17, 2025 23:58

volokluev changed the title ~~feat(cbrs): Time Window Routing~~ feat(cbrs): Time Window Routing (DO NOT MERGE) Sep 17, 2025

xurui-c reviewed Sep 18, 2025

View reviewed changes

properly slice things

ed57fba

volokluev commented Sep 18, 2025

View reviewed changes

volokluev and others added 7 commits September 18, 2025 14:28

Fix some unit tests and break the one I wrote :D

e04dd4f

add some comments, fix pagination

0e227b4

kruft

4cb222c

remove copypasta

d7e8dc3

Merge branch 'master' into volo/time_window_routing

ae3f048

cleanup

38ccf1b

mypy fixes

25ccc5d

volokluev changed the title ~~feat(cbrs): Time Window Routing (DO NOT MERGE)~~ feat(cbrs): Time Window Routing Sep 19, 2025

volokluev added 2 commits September 19, 2025 11:12

test fix

5bc4663

clear db for outcomes MVs

1ddb242

remove erroneous test

3ebfb4d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(cbrs): Time Window Routing #7418

feat(cbrs): Time Window Routing #7418

Uh oh!

volokluev commented Sep 17, 2025 •

edited

Loading

Uh oh!

xurui-c Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

xurui-c Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

xurui-c Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

volokluev Sep 18, 2025

Uh oh!

codecov bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

feat(cbrs): Time Window Routing #7418

Are you sure you want to change the base?

feat(cbrs): Time Window Routing #7418

Uh oh!

Conversation

volokluev commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design decisions

Window Sizing Algorithm

Pagination

What's missing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

volokluev commented Sep 17, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading