-
Notifications
You must be signed in to change notification settings - Fork 205
fix Contextual typing works only with the first element in a union #793 #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
cc @samwgoldman @grievejia for review |
pyrefly/lib/alt/expr.rs
Outdated
| } else { | ||
| let elem_tys = self.elts_infer(&x.elts, elt_hint, errors); | ||
| self.stdlib.list(self.unions(elem_tys)).to_type() | ||
| if let Some(hint_ref) = hint.as_ref() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asukaminato0721 do you remember why this change is needed for list, but set and dict work correctly out of the box given the unwrap.rs changes?
I'm not seeing why list is obviously different. Overall this looks pretty good to me, I'll try to get a second review on it as soon as I understand the list behavior a little better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, list literals are the only container literals that we feed into boolean short‑circuit contexts.
In those expressions we don’t want to collapse a contextual hint like list[A] | list[C] into a single “soft” hint for their elements, because list_with_hint otherwise builds one list[self.unions(elem_tys)], i.e. list[A | C], and that loses the ability to prove the result is assignable to either list[A] or list[C].
The new branch iteration lets us reuse the prior decomposition work from unwrap.rs verbatim. Still, it also keeps each branch’s element inference isolated, so we can short‑circuit as soon as we find a branch whose fully built list is a subtype of the contextual branch.
Without that, even though decompose_list now collects per-branch hints, the final list(...) we return is still “the union of all element hints”, which is exactly what flagged list[A|C] vs list[A] | list[C] in boolop_soft.
Set and dict literals don’t currently appear in any places where we need that stronger guarantee. We still decompose their hints per branch, but after unions, the result stays a single set[...]/dict[...].
Because those expressions aren’t fed back into another branch-sensitive context (like boolop mixing container unions), we haven’t yet found a case that needs the same “try each branch independently and bail out on first success” treatment.
If/when we run into a repro similar to the list case—for example, x or {1} against set[int] | set[str]—we can apply the same pattern there. For now, the regression tests only exercised the list form, hence the localized change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing along some notes from @samwgoldman, who has thought far more about contextual typing (and general issues with our constraint solver) than anyone else on the Pyrefly team:
After reading the diff somewhat closely, I think the main idea is good, but the specific execution is off.
The main idea is that a union-like annotation with N branches should be treated as N hints, i.e. Hint(Vec<Type>) not Hint(Type). For example x: T | U = e should contextually type e in the context of Hint(vec![T, U]). When we decompose a lint, we should decompose each branch of the hint separately (to avoid Var confusion). When we go to check an expression w.r.t. some hint, we treat multiple branches disjunctively -- only one branch needs to match.
This diff fixes the issue with Var confusion during decomposition by splitting union hints (collect_var_from_hint), but then packages the result back up into a union (hint_from_types). This is why the ExprList case needs to match on union in order to actually implement the disjunctive contextual typing behavior. This has two problems:
-
It confuses the hint representations for
list[A] | list[B]andlist[A | B], since these are now the same. As a result, when checking against either case, we need to check against each oflist[A]andlist[B]andlist[A, B]. We might have cases likelist[BigUnionTypeAlias]which now check the RHS expression N+1 times unnecessarily. -
More importantly, if we don't check for the union case, we do the wrong thing! See the following cases:
xs: dict[int, int] | dict[str, str] = {} # error: `dict[int | str, int | str]` is not assignable to `dict[int, int] | dict[str, str]`
ys: set[int] | set[str] = reveal_type({1}) # error: `set[int | str]` is not assignable to `set[int] | set[str]`
The dict case is particularly bad, since we can't recover dict[int, int] | dict[str, str] from dict[int | str, int | str].
(From me, not Sam: our internal CI includes an error delta against Pytorch that we haven't figured out how to reproduce in github cheaply and another one against instagram and this issue does seem to cause a significant number of new errors in both codebases).
There are a few other issues:
- The disjunctive logic isn't disjunctive enough, and can leak errors between branches. We rarely emit errors when contextually typing expressions, but we do sometimes (for typed dict specifically).
from typing import TypedDict
class A(TypedDict):
x: int
class B(TypedDict):
x: str
xs: list[A] | list[B] = [{"x": "foo"}] # error: `Literal['foo']` is not assignable to TypedDict key `x` with type `int`
Note that the above errors when we consider the option of list[A], but then continues with the next one and actually succeeds. The error is from a failed disjunctive branch. This is a smaller bug, but one that needs to be fixed.
- This diff implements it's own kind of
map_over_unionlogic, but it only handles UnionType. We should probably also handle unions of the formtype[A | B]. We should also handle Quantified types with union-typed bounds (I'm not sure if we handle bounded quantifieds even in the non-union case yet, though...). Do we need to handle Var? I wouldn't expect to see Var from an annotation type, but maybe it comes up?
My preference is to more-or-less start over, using the same idea, but a different approach:
- Change the representation of
Hintto be a Vec1 of types in disjunctive normal form. - Instead of trying to break up the unions when we go to decompose, we should do this at construction of the hint. Decomposition doesn't create new unions, so this seems like the wrong place to split the unions.
I had some additional ideas to optimize the disjunctive logic, to avoid re-inferring expressions unnecessarily, but this is just an optimization.
Added a union-aware fast path in expr_infer_with_hint plus branch-aware list inference so we can explore hints that don’t contain type variables before the generic ones. Together with the new type_contains_var (now driven by Type::universe) and prefer_union_branch_without_vars, contextual typing no longer eagerly binds callable/lambda type vars when a non-generic branch of a union would have sufficed.
fix #793
Added collect_var_from_hint/hint_from_types to gather element candidates across union branches and reused them in the container decomposers to keep contextual hints from collapsing to the first branch
Taught list inference to try each union branch with its own hint before falling back, plus factored the existing logic into a reusable helper
Unblocked the regression test by removing the expected errors and bug marker now that the behavior is fixed