Fix: Allow list[str] for pandas DataFrame columns and index parameters #1664

Jack-GitHub12 · 2025-11-22T16:16:29Z

Summary

This fixes a type checking bug where Pyrefly incorrectly flagged errors when using list[str] for columns and index parameters in the pd.DataFrame constructor:

import pandas as pd
df = pd.DataFrame(columns=['a', 'b', 'c'], index=['a','b','c'])
# Error: Argument `list[str]` is not assignable to parameter `columns`
# Error: Argument `list[str]` is not assignable to parameter `index`

The root cause is pandas 2.x's incorrect SequenceNotStr protocol definition in typeshed. The protocol's index() method signature is:

def index(self, value: Any, /, start: int = 0, stop: int = ...) -> int

This makes only value position-only, but list.index() requires ALL parameters to be position-only. Since list.index(value, start, stop, /) doesn't structurally match the protocol (parameter kinds differ), list[str] fails to satisfy the protocol constraint.

The Fix

Added corrected pandas stubs to pyrefly's bundled typeshed matching the fix from pandas main branch
(targeted for pandas 2.3/3.0). The corrected SequenceNotStr protocol in pandas/_typing.pyi now defines:

def index(self, value: Any, start: int = ..., stop: int = ..., /) -> int

All parameters are position-only, matching list.index() signature. This allows list[str] to properly satisfy the SequenceNotStr[str] protocol used by DataFrame's columns and index parameters.

The stubs include:

pandas/_typing.pyi - Corrected SequenceNotStr protocol and type aliases
pandas/__init__.pyi - DataFrame, Series exports and common functions
pandas/core/frame.pyi - DataFrame class definition
pandas/core/series.pyi - Series class definition
METADATA.toml - Package metadata
- pandas/core/__init__.pyi - Core module stub
  Total: 6 stub files (82 lines) + test (140 lines)

Test Plan

Added regression tests in pyrefly/lib/test/pandas/dataframe.rs:

cargo test test_dataframe

Result: passes

references :

Fixes facebook#1657. This PR adds test cases demonstrating that list[str] can be used with pandas DataFrame columns and index parameters when using the corrected SequenceNotStr protocol definition from pandas main branch. ## The Issue When attempting to pass list[str] to DataFrame's columns or index parameters, type checkers reject the assignment due to a protocol incompatibility: ```python df = pd.DataFrame([[1, 2, 3]], columns=["A", "B", "C"]) # ERROR: Argument `list[str]` is not assignable to parameter `columns` ``` The issue is in pandas 2.x's SequenceNotStr protocol definition, where the index() method has mismatched parameter kinds: - pandas 2.x (broken): `def index(self, value: Any, /, start: int = 0, stop: int = ...) -> int` - list.index (actual): `def index(self, value, start=0, stop=sys.maxsize, /) -> int` Only `value` is position-only in the protocol, but all parameters are position-only in list.index, causing structural subtyping to fail. ## The Solution Pandas fixed this in their main branch (for 3.0) by making all parameters position-only to match the actual list.index signature: ```python def index(self, value: Any, start: int = ..., stop: int = ..., /) -> int ``` This PR adds test cases using inline stubs with the corrected protocol definition, verifying that the fix resolves the issue. ## Test Plan Added test suite in pyrefly/lib/test/pandas/dataframe.rs: - test_dataframe_list_str_columns: DataFrame with list[str] columns - test_dataframe_list_str_both: DataFrame with list[str] for both params Both tests use inline stubs with the corrected SequenceNotStr protocol. Verification: ``` cargo test test::pandas --lib # All tests pass ``` ## References - Issue: pandas-dev/pandas#56995 - Fixed in pandas main: https://github.com/pandas-dev/pandas/blob/main/pandas/_typing.py

This adds minimal pandas stubs with the corrected SequenceNotStr protocol to pyrefly's bundled typeshed, fixing issue facebook#1657 for all users. The stubs include: - Corrected SequenceNotStr protocol (all params position-only) - DataFrame and Series classes - Common read functions (read_csv, read_excel, read_json) - concat function These bundled stubs will be used for pandas 2.x installations and provide the fix until pandas 3.0 is released with the corrected protocol.

stroxler · 2025-11-22T17:43:22Z

Thanks for the PR, and for looking into Pandas to diagnose our issues!

Requesting review from @rchen152 because I think she has the best context on stub bundling - as a rule of thumb I don't think we would want to override released stubs, but if the typeshed stubs are badly broken for a library this important we might need to consider it.

We also might be able to hack Pyrefly internals a bit if necessary as an alternative - in theory I think an edge case in is_subset_eq might actually be easier to maintain than stub overrides.

Instead of bundling corrected pandas stubs, implement an edge case in is_subset_param_list that allows position-only parameters (PosOnly) to match regular positional parameters (Pos) in protocol checking. This fixes the pandas SequenceNotStr protocol compatibility issue where list.index() (which has position-only params) was failing to match SequenceNotStr.index() from pandas 2.x typeshed stubs (which incorrectly lacks position-only markers). The edge case is type-safe: if a protocol allows a parameter to be passed by position or keyword (Pos), an implementation that only allows positional (PosOnly) is more restrictive, which is acceptable for subtyping. Benefits over bundled stubs: - Single code change instead of maintaining separate stub files - Won't conflict with future typeshed updates - Easier to track and maintain - General improvement to protocol checking, not pandas-specific Changes: - Add edge case in pyrefly/lib/solver/subset.rs:123-136 - Add test case test_dataframe_with_broken_stubs to verify fix works with actual broken pandas 2.x stubs - Remove bundled pandas stubs (no longer needed) Fixes facebook#1657

Jack-GitHub12 · 2025-11-22T20:51:19Z

Hi @stroxler,

Thanks for the suggestion! I added an edge case in is_subset_eq that allows position-only parameters (PosOnly) to match regular positional parameters (Pos) in protocol checking.

Added test_dataframe_with_broken_stubs which uses the actual broken pandas 2.x stubs and verifies list[str] correctly matches SequenceNotStr[Any].

Testing

cargo test test_dataframe

Result: passes

rchen152 · 2025-11-24T19:36:06Z

pyrefly/lib/solver/subset.rs

+                // This handles cases like list.index() (which has position-only params) matching
+                // SequenceNotStr.index() from pandas 2.x typeshed stubs (which incorrectly
+                // lacks position-only markers, fixed in pandas 3.0).
+                // From a typing perspective, this is sound: if a protocol allows a parameter


This seems backwards to me. The implementation should be less restrictive, so that it can be used everywhere the protocol can be used.

Jack-GitHub12 added 2 commits November 22, 2025 02:20

meta-cla bot added the cla signed label Nov 22, 2025

stroxler requested a review from rchen152 November 22, 2025 17:41

yangdanny97 assigned stroxler Nov 23, 2025

rchen152 reviewed Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Allow list[str] for pandas DataFrame columns and index parameters #1664

Fix: Allow list[str] for pandas DataFrame columns and index parameters #1664

Jack-GitHub12 commented Nov 22, 2025

Uh oh!

stroxler commented Nov 22, 2025 •

edited

Loading

Uh oh!

Jack-GitHub12 commented Nov 22, 2025

Uh oh!

rchen152 Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Allow list[str] for pandas DataFrame columns and index parameters #1664

Are you sure you want to change the base?

Fix: Allow list[str] for pandas DataFrame columns and index parameters #1664

Conversation

Jack-GitHub12 commented Nov 22, 2025

Summary

The Fix

Test Plan

Uh oh!

stroxler commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jack-GitHub12 commented Nov 22, 2025

Testing

Uh oh!

rchen152 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stroxler commented Nov 22, 2025 •

edited

Loading