-
Notifications
You must be signed in to change notification settings - Fork 475
[v2] Start type checking #3176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Start type checking #3176
Conversation
# Conflicts: # mteb/abstasks/AbsTask.py # mteb/models/model_implementations/listconranker.py # pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces initial type checking setup for the MTEB project using mypy as referenced in issue #2714. The changes focus on establishing a basic type checking configuration and addressing fundamental typing issues across the codebase.
- Adds mypy configuration and type stub dependencies for comprehensive type checking
- Updates function signatures to include proper type annotations and return types
- Refactors descriptive statistics type hierarchy to support better type safety
Reviewed Changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
pyproject.toml | Adds mypy dependency group and configuration with module overrides |
mteb/types/statistics.py | Refactors statistics type hierarchy with new base class |
mteb/create_dataloaders.py | Updates function signatures and removes **kwargs for explicit parameters |
mteb/abstasks/*.py | Updates abstract task classes to use proper type annotations |
mteb/_evaluators/*.py | Adds return type annotations to evaluator methods |
mteb/models/model_implementations/listconranker.py | Removes optional parameter type |
mteb/types/_encoder_io.py | Adds type ignore comment for multiple inheritance |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
# Conflicts: # mteb/_evaluators/clustering_evaluator.py # mteb/abstasks/AbsTaskAnyClustering.py # mteb/abstasks/AbsTaskMultilabelClassification.py
encode_kwargs: dict[str, Any], | ||
test_cache: np.ndarray | None = None, | ||
) -> tuple[dict[str, float], Any]: | ||
) -> tuple[dict[str, float], np.ndarray | None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a docstring would be good here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
{include-group = "lint"}, | ||
{include-group = "test"}, | ||
] | ||
typing = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us add a makefile command as well
[tool.mypy] | ||
plugins = ['pydantic.mypy'] | ||
|
||
[[tool.mypy.overrides]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add some comments to these overrides - not quite sure what they do
task_metadata: TaskMetadata, | ||
input_column: str | None = None, | ||
**dataloader_kwargs: dict[str, Any], | ||
batch_size: int = 32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not dataloader_kwargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need them, because only batch_size
is helpful for us, I think https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about num_workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we require this, because we don't do something tricky, but I will test this
# Conflicts: # mteb/abstasks/AbsTaskRetrieval.py # mteb/abstasks/AbsTaskTextRegression.py # mteb/abstasks/aggregated_task.py
Ref #2714
task._evaluate_subset
with each otherEvaluator.__call__
, but classification and regression task return 2 items instead of 1