-
-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
It would be nice if we could say "at most 50 tasks fetching URLs at once, and at most 5 of those for any given hostname". So that might look something like:
await run_on_each(fetch, urls, max_at_once=50, key=hostname_from_url, max_at_once_per_key=5)This will create a dramatic increase in complexity. Considerations include:
- If you already have 5 tasks running for host A, and then you keep pulling from
urlsand they keep being host A, then you may end up doing unbounded buffering.- Worse, if there's a bounded set of possible hosts, then you shouldn't keep pulling for
urlsif all the possible hosts are already saturated... but how do you know if the key set is bounded? - If the input iterable is a priority queue (like it should be for e.g. a depth-first directory traversal), then buffering items internally will mess up the prioritization
- Worse, if there's a bounded set of possible hosts, then you shouldn't keep pulling for
- We don't want to keep around lots of meter state for keys that showed up once a while ago but aren't showing up now. We'd like to drop those meter state objects. How can we figure out when it's safe to do that? For a
MaxMeterit's just, the current number of tasks is 0. For aTokenBucketMeter, it's more complicated... - Do you support just one key, or multiple keys with different meter sets? Multiple keys dramatically increases the complexity again.
Metadata
Metadata
Assignees
Labels
No labels