Skip to content

Conversation

@xurui-c
Copy link
Member

@xurui-c xurui-c commented Sep 17, 2025

@xurui-c xurui-c force-pushed the rachel/movepolicies branch 5 times, most recently from 4b54c00 to 325e6eb Compare September 19, 2025 04:05
@codecov
Copy link

codecov bot commented Sep 19, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2976 1 2975 12
View the top 1 failed test(s) by shortest run time
tests.web.test_delete_query::test_delete_query_with_rejecting_allocation_policy
Stack Traces | 0.024s run time
Traceback (most recent call last):
  File "/.venv/lib/python3.11........./site-packages/_pytest/runner.py", line 341, in from_call
    result: TResult | None = func()
                             ^^^^^^
  File "/.venv/lib/python3.11........./site-packages/_pytest/runner.py", line 242, in <lambda>
    lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
           ^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11.../site-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11....../site-packages/_pytest/threadexception.py", line 92, in pytest_runtest_call
    yield from thread_exception_runtest_hook()
  File "/.venv/lib/python3.11....../site-packages/_pytest/threadexception.py", line 68, in thread_exception_runtest_hook
    yield
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11....../site-packages/_pytest/unraisableexception.py", line 95, in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
  File "/.venv/lib/python3.11....../site-packages/_pytest/unraisableexception.py", line 70, in unraisable_exception_runtest_hook
    yield
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11....../site-packages/_pytest/logging.py", line 846, in pytest_runtest_call
    yield from self._runtest_for(item, "call")
  File "/.venv/lib/python3.11....../site-packages/_pytest/logging.py", line 829, in _runtest_for
    yield
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11.../site-packages/_pytest/capture.py", line 880, in pytest_runtest_call
    return (yield)
            ^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/.venv/lib/python3.11.../site-packages/_pytest/skipping.py", line 257, in pytest_runtest_call
    return (yield)
            ^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11........./site-packages/_pytest/runner.py", line 174, in pytest_runtest_call
    item.runtest()
  File "/.venv/lib/python3.11....../site-packages/_pytest/python.py", line 1627, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/.venv/lib/python3.11....../site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/.venv/lib/python3.11.........................../site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11....../site-packages/_pytest/python.py", line 159, in pytest_pyfunc_call
    result = testfunction(**testargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../tests/web/test_delete_query.py", line 63, in test_delete_query_with_rejecting_allocation_policy
    class RejectPolicy(AllocationPolicy):
  File ".../snuba/utils/registered_class.py", line 93, in __new__
    getattr(res, "_registry").register_class(res)
  File ".../snuba/utils/registered_class.py", line 33, in register_class
    raise RegisteredClassNameCollisionError(
snuba.utils.registered_class.RegisteredClassNameCollisionError: Class with name AllocationPolicy.RejectPolicy already exists in the registry, change the config_key property in the class <class 'test_delete_query.test_delete_query_with_rejecting_allocation_policy.<locals>.RejectPolicy'> or <class 'test_storage_routing.test_routing_strategy_with_rejecting_allocation_policy.<locals>.RejectPolicy'>

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a refactor of https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py, because it's also used in outcomes based strategy

if attribution_info.app_id.key == "eap":
return

quota_allowances: dict[str, QuotaAllowance] = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid executing https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L848-L890, which is implemented elsewhere in the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the reason why I decided to use attribution_info.app_id.key instead of team or feature is because they can also be set to ourlogs even for eap queries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are no allocation policies on the storage, you won't execute anything though right?

Copy link
Member Author

@xurui-c xurui-c Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing would execute for eap because I added the if statement, but if i didnt do that then https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L848-L890 would've executed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very hacky way to do this. You're depending on some metadata injected way up somewhere in the pipeline to be set to a specific value.

The much more reasonable thing would be to return early if allocation_policies is empty

if attribution_info.app_id.key == "eap":
return

quota_allowances: dict[str, QuotaAllowance] = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just changing the type from Any to QuotaAllowance because that's more accurate

span.description = self.config_key()
self.routing_context = RoutingContext(timer=self._timer, in_msg=in_msg)
self.routing_context = RoutingContext(
timer=self._timer, in_msg=in_msg, query_id=uuid.uuid4().hex
Copy link
Member Author

@xurui-c xurui-c Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query_id is later needed for getting & updating quota allowances, so might as well just set it here. Originally query_id is randomly generated right before applying allocation policies, and it seems like the only thing that matters is that query_id stays consistent for getting & updating quota allowances, because it later gets reset in the query pipeline, and ofc for querylog

"request_invalid",
tags=self._timer.tags,
)
elif isinstance(error, AllocationPolicyViolations):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually, this metric was never raised because if therre is a AllocationPolicyViolations, then we raise a QueryException: https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L667-L690

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-09-21 at 8 05 49 PM

out = self._execute(in_msg)
else:
raise AllocationPolicyViolations
self.metrics.increment(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also open to just not having this metric because we emit more detailed ones in output_metrics

)
# thoughts on making AlocationPolicyViolations a subclass of QueryException? QueryOrError expects QueryException
raise QueryException.from_args(
AllocationPolicyViolations.__name__,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_metrics_error = None
try:
self.routing_decision.strategy.output_metrics(self.routing_decision)
if self.routing_decision.routing_context.query_result is not None or isinstance(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags=self._timer.tags,
)
else:
elif not isinstance(error, QueryException):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I wanted AllocationPolicyViolations to inherit from QueryException (and then I would raise AllocationPolicyViolations as AllocationPolicyViolations, so then this metric gets emitted for non-allocation policy violation errors, since rate limiting shouldn't count as request_error anyways

) as span:
request_meta = extract_message_meta(routing_context.in_msg)
recommendations[allocation_policy_name] = allocation_policy.get_quota_allowance(
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that this is what tenant_ids is across all endpoints: https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L825

Maybe we should pull this into RoutingContext, but it's getting a little bloated idk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only one project id we use the project id

)
record_query(_construct_hacky_querylog_payload(self, routing_decision))

@final
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
"time_budget": 8000,
},
"allocation_policies_recommendations": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the new querylog payload looks like;

{
 "sql": "",
 "sql_anonymized": "",
 "start_timestamp": 1758491216,
 "end_timestamp": 1758494816,
 "stats": {
  "final": false,
  "cache_hit": 0,
  "max_threads": 0,
  "clickhouse_table": "na",
  "query_id": "dc409fabfddd452f8c842511a24dcc62",
  "is_duplicate": 0,
  "consistent": false,
  "can_run": true,
  "is_throttled": false,
  "strategy": "MetricsStrategy",
  "source_request_id": "fe40d336-153b-4257-8a3e-b9b5fec9f854",
  "extra_info": {
   "sampling_in_storage_estimation_time_overhead": {
    "type": "timing",
    "value": 19,
    "tags": null
   },
   "time_budget": 8000,
   "sampling_in_storage_routing_success": {
    "type": "increment",
    "value": 1,
    "tags": {
     "tier": "TIER_8"
    }
   },
   "sampling_in_storage_my_metric": {
    "type": "increment",
    "value": 1,
    "tags": {
     "a": "b",
     "c": "d"
    }
   },
   "sampling_in_storage_query_timing": {
    "type": "timing",
    "value": 1.0,
    "tags": {
     "tier": "TIER_8"
    }
   }
  },
  "clickhouse_settings": {},
  "result_info": {
   "meta": {},
   "profile": {
    "bytes": 420,
    "elapsed": 1.0
   },
   "stats": {
    "stat": 1
   },
   "sql": "SELECT * FROM your_mom"
  },
  "routed_tier": "TIER_8",
  "allocation_policies_recommendations": {
   "ConcurrentRateLimitAllocationPolicy": {
    "can_run": true,
    "max_threads": 10,
    "explanation": {},
    "is_throttled": false,
    "throttle_threshold": 22,
    "rejection_threshold": 22,
    "quota_used": 1,
    "quota_unit": "concurrent_queries",
    "suggestion": "no_suggestion",
    "max_bytes_to_read": 0
   },
   "ReferrerGuardRailPolicy": {
    "can_run": true,
    "max_threads": 10,
    "explanation": {},
    "is_throttled": false,
    "throttle_threshold": 1000000000000,
    "rejection_threshold": 1000000000000,
    "quota_used": 0,
    "quota_unit": "no_units",
    "suggestion": "no_suggestion",
    "max_bytes_to_read": 0
   },
   "BytesScannedRejectingPolicy": {
    "can_run": true,
    "max_threads": 10,
    "explanation": {},
    "is_throttled": false,
    "throttle_threshold": 1000000000000,
    "rejection_threshold": 1000000000000,
    "quota_used": 0,
    "quota_unit": "no_units",
    "suggestion": "no_suggestion",
    "max_bytes_to_read": 0
   }
  }
 },
 "status": "0",
 "trace_id": "",
 "profile": {
  "time_range": null,
  "table": "eap_items",
  "all_columns": [],
  "multi_level_condition": false,
  "where_profile": {
   "columns": [],
   "mapping_cols": []
  },
  "array_join_cols": [],
  "groupby_cols": []
 },
 "result_profile": {
  "bytes": 420,
  "progress_bytes": 0,
  "elapsed": 1.0
 },
 "request_status": "na",
 "slo": "na"
}

snuba/util.py Outdated
Comment on lines 27 to 41
def get_max_bytes_to_read(quota_allowances: dict[str, QuotaAllowance]) -> int:
"""
alternative implementation:
non_zero_max_bytes = [qa.max_bytes_to_read for qa in quota_allowances.values() if qa.max_bytes_to_read > 0]
return min(non_zero_max_bytes) if non_zero_max_bytes else 0
"""
max_bytes_to_read = min(
[qa.max_bytes_to_read for qa in quota_allowances.values()],
key=lambda mb: float("inf") if mb == 0 else mb,
)
if max_bytes_to_read != 0:
return max_bytes_to_read
return 0


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too high up in the folder structure. It's specific to allocation policies

"request_rate_limited",
tags=self._timer.tags,
)
# thoughts on making AlocationPolicyViolations a subclass of QueryException? QueryOrError expects QueryException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure go for it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided against it because it's suppsoed to be chained together not inheritance relationship: https://github.com/getsentry/snuba/blob/master/snuba/web/__init__.py#L20-L22

)

def _get_routing_decision(self, routing_context: RoutingContext) -> RoutingDecision:
def _get_combined_allocation_policies_recommendations(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other routing strategies will use this. I would put this into a utility function in the allocation policies. All this is doing is combining the results of many policies which should fall into the responsibility of the allocation policy subsystem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person, going to put this in BaseRoutingStrategy and other strats can override it. It's up to the routing strategy to decide how to combine recommendations, whether its voting or weighted voting or all or any or whatever idk

@xurui-c xurui-c force-pushed the rachel/movepolicies branch 3 times, most recently from 5aaa5c4 to 6731d49 Compare September 23, 2025 21:57
@xurui-c xurui-c changed the title tryna move the allocation policies ref(cbrs): move EAP allocation policies into the routing strategy layer Sep 23, 2025
"experiments": {},
},
)
error.__cause__ = AllocationPolicyViolations.from_args(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the QueryException "chaining"

@xurui-c xurui-c marked this pull request as ready for review September 23, 2025 22:20
@xurui-c xurui-c requested review from a team as code owners September 23, 2025 22:20
Comment on lines 380 to 394
recommendations[allocation_policy_name] = allocation_policy.get_quota_allowance(
{
"organization_id": request_meta.organization_id,
"referrer": request_meta.referrer,
},
routing_context.query_id,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The call to get_quota_allowance is missing the required project_id, causing two critical allocation policies to systematically fail for all EAP queries.
  • Description: The get_quota_allowance method is called with a dictionary containing only organization_id and referrer. However, two of the three allocation policies, ConcurrentRateLimitAllocationPolicy and BytesScannedRejectingPolicy, require a project_id. This omission causes them to raise an InvalidTenantsForAllocationPolicy exception. While this exception is handled and results in a can_run=False response instead of a crash, it constitutes a significant functional bug. It means that these critical resource management policies are not being correctly applied to EAP queries, leading to all such queries being rejected due to missing tenant information rather than actual resource constraints.

  • Suggested fix: The project_id needs to be extracted from request_meta.project_ids and included in the dictionary passed to allocation_policy.get_quota_allowance. Since project_ids is a list, logic should be added to select the appropriate project_id from the list, likely the first one if the list is not empty.
    severity: 0.7, confidence: 0.95

Did we get this right? 👍 / 👎 to inform future reviews.

@xurui-c xurui-c requested a review from volokluev September 23, 2025 22:26
Comment on lines 38 to 52
def get_max_bytes_to_read(quota_allowances: dict[str, QuotaAllowance]) -> int:
"""
alternative implementation:
non_zero_max_bytes = [qa.max_bytes_to_read for qa in quota_allowances.values() if qa.max_bytes_to_read > 0]
return min(non_zero_max_bytes) if non_zero_max_bytes else 0
"""
max_bytes_to_read = min(
[qa.max_bytes_to_read for qa in quota_allowances.values()],
key=lambda mb: float("inf") if mb == 0 else mb,
)
if max_bytes_to_read != 0:
return max_bytes_to_read
return 0


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this file already has a bunch of stuff in it. I would put this function it its own file

if attribution_info.app_id.key == "eap":
return

quota_allowances: dict[str, QuotaAllowance] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very hacky way to do this. You're depending on some metadata injected way up somewhere in the pipeline to be set to a specific value.

The much more reasonable thing would be to return early if allocation_policies is empty

Comment on lines 40 to 42
alternative implementation:
non_zero_max_bytes = [qa.max_bytes_to_read for qa in quota_allowances.values() if qa.max_bytes_to_read > 0]
return min(non_zero_max_bytes) if non_zero_max_bytes else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of this docstring?

if self.routing_decision.routing_context.query_result is not None or isinstance(
error, QueryException
):
self.routing_decision.strategy.update_allocation_policies_balances(self.routing_decision, error) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't quite smell right to me. The routing strategy should contain this functionality within itself. it could be that output_metrics is not a good name for the method being called above but overall, there's something a routing strategy is supposed to do before the call (decide the routing tier/time window/ etc) and after the call (output metrics, update quota balances, etc). This is relying on the caller to know something three layers deep. That is a leaky abstraction

) as span:
request_meta = extract_message_meta(routing_context.in_msg)
recommendations[allocation_policy_name] = allocation_policy.get_quota_allowance(
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only one project id we use the project id

@xurui-c xurui-c force-pushed the rachel/movepolicies branch 2 times, most recently from 48bf012 to dc0294c Compare September 30, 2025 23:36
@xurui-c xurui-c requested a review from volokluev October 2, 2025 17:01
Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress, a few usability questions and suggestions

Comment on lines 371 to 374
elif not (
isinstance(error, QueryException)
and error.exception_type == AllocationPolicyViolations.__name__
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is kind of ugly, is there no other way to do it?

Comment on lines 387 to 393
if routing_strategy_after_execute_error is not None:
self.metrics.increment("after_execute_failure")
sentry_sdk.capture_message(
f"Error in routing strategy output metrics: {output_metrics_error}"
f"Error in routing strategy after execute: {routing_strategy_after_execute_error}"
)
if settings.RAISE_ON_ROUTING_STRATEGY_FAILURES:
raise output_metrics_error
raise routing_strategy_after_execute_error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this exception handling should probably be taken care of in the routing strategies

Comment on lines 156 to 171
combined_allocation_policies_recommendations = (
self._get_combined_allocation_policies_recommendations(routing_context)
)

routing_decision = RoutingDecision(
routing_context=routing_context,
strategy=self,
tier=Tier.TIER_1,
clickhouse_settings={},
can_run=True,
clickhouse_settings=combined_allocation_policies_recommendations["settings"],
can_run=combined_allocation_policies_recommendations["can_run"],
is_throttled=combined_allocation_policies_recommendations["is_throttled"],
)

if not routing_decision.can_run:
return routing_decision

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have more than one routing strategy in use right now. Can we provide sane default behavior so that this doesn't have to be copy-pasted into every new routing strategy we use?

Comment on lines 562 to 573
with mock.patch.object(
BaseRoutingStrategy,
"get_allocation_policies",
return_value=[
ThrottleAllocationPolicy(
ResourceIdentifier(StorageKey("doesntmatter")), ["a", "b", "c"], {}
),
ThrottleAllocationPolicyDuplicate(
ResourceIdentifier(StorageKey("doesntmatter")), ["a", "b", "c"], {}
),
],
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why mock when you can extend and override?

@xurui-c xurui-c force-pushed the rachel/movepolicies branch from 789a200 to c7cfcda Compare October 13, 2025 21:43
@xurui-c xurui-c requested a review from volokluev October 14, 2025 18:19
Comment on lines 200 to 208
tenant_ids={
"organization_id": request_meta.organization_id,
"referrer": request_meta.referrer,
**(
{"project_id": request_meta.project_ids[0]}
if hasattr(request_meta, "project_ids") and len(request_meta.project_ids) == 1
else {}
),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like something that should be done in the routing strategy, not the rpc endpoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not set an empty tenant_ids in rpc endpoint and then populate it in routing strategy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, routing strategy selector should be using the fields set in tenant_ids (ie organization_id & project_id), and routing strategy selector happens before routing strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need tenant_ids at all in the routing context. This is information can entirely be calculated from the request_meta object. A cleaner solution would be to define a tenant_ids property:

class RoutingContext:

    @property
    def tenant_ids(self):
        return {
                "organization_id": request_meta.organization_id,
                "referrer": request_meta.referrer,
                **(
                    {"project_id": request_meta.project_ids[0]}
                    if hasattr(request_meta, "project_ids") and len(request_meta.project_ids) == 1
                    else {}
                )
            }

"request_rate_limited",
tags=self._timer.tags,
)
error = QueryException.from_args(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why you have to wrap it in a QueryException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A combination of:

  1. this is an existing pattern: https://github.com/getsentry/snuba/blob/master/snuba/web/db_query.py#L682-L691.
  2. I want this to be caught by: https://github.com/getsentry/snuba/blob/master/snuba/web/rpc/__init__.py#L424-L425
  3. Making AllocationPolicyViolations inherit from QueryException goes against the initial design: ref(cbrs): move EAP allocation policies into the routing strategy layer #7411 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is not blocking for approval but here are my thoughts in reply:

  1. I think this pattern has always been fairly dysfunctional and we're just perpetuating it. The only thing that really expects a QueryException is the AllocationPolicy:

https://github.com/getsentry/snuba/blob/master/snuba/query/allocation_policies/__init__.py#L45

but this could easily be changed to allow any kind of errors, especially AllocationPolicyViolation(s)

  1. You could make a RPCAllocationPolicyException(RPCException) class and raise it here and it would be caught

  2. You don't need to inherit it from QueryException it can just be its own thing. QueryException denotes a problem with clickhouse query execution and we're now muddying its meaning to also mean the RPC request failed which it could be two differeint things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not going to make the updates above, at the very least I would ask that this ugly chaining thing be abstracted into its own component rather than taking up a whole bunch of space in the RPC method. This is a core component and its functionality should be easy to understand at a glance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heres a place that also expects a QueryException: https://github.com/getsentry/snuba/blob/master/snuba/web/views.py#L468-L471

I think for now I'll just abstract it into its own component and make a follow up pr

@xurui-c xurui-c requested a review from volokluev October 15, 2025 18:10
@xurui-c xurui-c force-pushed the rachel/movepolicies branch from ad16694 to 0282a73 Compare October 15, 2025 18:14
@xurui-c xurui-c force-pushed the rachel/movepolicies branch from 20288e8 to e2015f2 Compare October 15, 2025 19:40
Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with recommended changes so you're not blocked while I'm on PTO. Please give consideration to my suggestions though and make sure you have a rollout plan

@xurui-c xurui-c merged commit 76d6e0b into master Oct 16, 2025
34 checks passed
@xurui-c xurui-c deleted the rachel/movepolicies branch October 16, 2025 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants