Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive type annotations to the cottoncandy codebase to improve type safety and code clarity. The changes include refactoring CloudStream from a class to a NamedTuple, updating the Python minimum version requirement from 3.7 to 3.9, and adding mypy type checking to the CI/CD pipeline.
Changes:
- Added type annotations for function signatures, parameters, return types, and variables throughout the codebase
- Refactored CloudStream from a class to a NamedTuple for improved immutability and type safety
- Updated Python minimum version from >=3.7 to >=3.9 to support modern type annotation syntax (PEP 585)
- Added mypy type checking to GitHub Actions workflow (with continue-on-error flag)
- Added py.typed marker file for PEP 561 compliance
- Fixed several bugs including incorrect raise statements, missing return statements, and missing else clauses
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updated Python version requirement to >=3.9, added mypy and ipython test dependencies, added py.typed to package data |
| mypy.ini | Added mypy configuration with allow_redefinition and disabled import-untyped error |
| .github/workflows/run_tests.yml | Added mypy type checking step (continue-on-error) |
| cottoncandy/py.typed | Added PEP 561 marker file for type checking support |
| cottoncandy/backend.py | Refactored CloudStream to NamedTuple, added type annotations to abstract methods |
| cottoncandy/utils.py | Added comprehensive type annotations, fixed raise statement, added return statement to seek() |
| cottoncandy/interfaces.py | Added type annotations throughout, fixed missing return in cloud2dict, added else clause for sparse array types |
| cottoncandy/init.py | Added type annotations to module functions and variables |
| cottoncandy/s3client.py | Added type annotations to S3 backend methods |
| cottoncandy/localclient.py | Added type annotations to local filesystem backend methods |
| cottoncandy/gdriveclient.py | Added type ignore comments for dynamic IPython imports |
| cottoncandy/browser.py | Added type ignore comment for duplicate dir method |
| cottoncandy/tests/*.py | Added type annotations for test fixtures and variables |
| cottoncandy/tests/conftest.py | Added else clause to raise error for invalid client types |
Comments suppressed due to low confidence (2)
cottoncandy/utils.py:499
- GzipInputStream now inherits from BytesIO but doesn't call
super().__init__()in its__init__method. This means the BytesIO parent class is never properly initialized, which could lead to issues if any BytesIO methods rely on the parent's initialization. Consider callingsuper().__init__()or reconsidering whether BytesIO inheritance is necessary here.
class GzipInputStream(BytesIO):
"""Simple class that allow streaming reads from GZip files
(from https://gist.github.com/beaufour/4205533).
Python 2.x gzip.GZipFile relies on .seek() and .tell(), so it
doesn't support this (@see: http://bo4.me/YKWSsL).
Adapted from: http://effbot.org/librarybook/zlib-example-4.py
"""
def __init__(self, fileobj: BinaryIO, block_size: int=16384):
"""
Initialize with the given file-like object.
@param fileobj: file-like object,
"""
self.BLOCK_SIZE = block_size # Read block size
# zlib window buffer size, set to gzip's format
self.WINDOW_BUFFER_SIZE = 16 + zlib.MAX_WBITS
self._file = fileobj
self._zip = zlib.decompressobj(self.WINDOW_BUFFER_SIZE)
self._offset = 0 # position in unzipped stream
self._data = b''
cottoncandy/interfaces.py:630
- Since the minimum Python version is now 3.9, the
six.PY2check will always be False and thesix.PY3check will always be True. This code can be simplified by removing the PY2 branch and the PY3 conditionals. Lines 597-598 could be simplified to just checknot array.flags['C_CONTIGUOUS'], and lines 601-604 and 626-630 no longer need thesix.PY3conditions.
order: Literal['C', 'F'] = 'C' if array.flags.carray else 'F'
if ((not array.flags['%s_CONTIGUOUS' % order] and six.PY2) or # type: ignore
(not array.flags['C_CONTIGUOUS'] and six.PY3)):
warn('Non-contiguous array. Creating copy (will use extra memory)...')
if six.PY3 and order == 'F':
# memoryview (PY3) vs buffer (PY2) issues
warn("PY3: Changing array from 'F' to 'C' order")
order = 'C'
# create contiguous copy
array = np.array(array, order=order)
meta = dict(dtype=array.dtype.str,
shape=','.join(map(str, array.shape)),
compression=str(compression),
order=order)
# check for conflicts in metadata
metadata_keys = []
for k in metadata.keys():
# check for conflicts in metadata
metadata_keys.append(k in meta)
assert not any(metadata_keys)
meta.update(metadata)
if compression is False:
filestream = StringIO(array.data)
elif compression == 'gzip':
if six.PY3 and array.flags['F_CONTIGUOUS']:
# eventually, array.data below should be changed to np.getbuffer(array)
# (not yet working in python3 numpy)
# F-contiguous arrays break gzip in python 3
array = array.T
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -397,24 +401,25 @@ def upload_file(self, file_name, cloud_name=None, permissions=DEFAULT_ACL, threa | |||
| multipart_threshold = MPU_THRESHOLD) | |||
| return s3_object.upload_file(file_name, ExtraArgs={'ACL': permissions}, Config = config) | |||
There was a problem hiding this comment.
The permissions parameter defaults to None, but it's directly passed to boto3's ExtraArgs={'ACL': permissions} at line 402. If permissions is None, this will pass None as the ACL value to boto3, which may cause unexpected behavior or errors. Consider adding a default value check or ensuring callers always provide a value.
| if six.PY3: | ||
| if to.flags['F_CONTIGUOUS']: | ||
| vw = to.T.view() | ||
| else: | ||
| vw = to.view() | ||
| vw.shape = (-1,) # Must be a ravel-able object | ||
| vw.dtype = np.dtype('uint8') # 256 values in each byte | ||
| # 256 values in each byte | ||
| vw.dtype = np.dtype('uint8') # type: ignore[misc] | ||
|
|
||
| for ci in range(int(np.ceil(nbytes_total / float(buffersize)))): | ||
| start = ci * buffersize | ||
| end = min(nbytes_total, (ci + 1) * buffersize) | ||
| if six.PY2: | ||
| to.data[start:end] = frm.read(end - start) | ||
| elif six.PY3: | ||
| vw.data[start:end] = frm.read(end - start) | ||
| else: | ||
| raise("Unknown python version") # not sure six will ever do anything here (6=2x3) | ||
| raise Exception("Unknown python version") # not sure six will ever do anything here (6=2x3) | ||
|
|
There was a problem hiding this comment.
The Python minimum version has been bumped to 3.9, but this code still checks for six.PY2 which will never be True. Since Python 2 is no longer supported, the six.PY2 branch at lines 468-469 is now dead code and should be removed. The entire function could be simplified to only handle Python 3.
| return os.path.isfile(os.path.join(bucket_name, cloud_name)) | ||
|
|
||
| def upload_stream(self, stream, cloud_name, metadata, permissions, threads = 1): | ||
| def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None: |
There was a problem hiding this comment.
The metadata parameter is typed as dict[str, str] in this implementation, but the abstract method in backend.py:51 specifies just dict. While this is not strictly wrong (a more specific type for an implementation is acceptable), it would be better for consistency if the abstract method also specified dict[str, str] to match the implementations and the CloudStream metadata field type.
| def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None: | |
| def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict, permissions: Optional[str] = None, threads: int = 1) -> None: |
| def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False): | ||
| new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite) | ||
| old_ob = self.get_s3_object(source, bucket_name = source_bucket) | ||
| old_ob.delete() | ||
| return new_ob |
There was a problem hiding this comment.
The move method is missing a return type annotation. According to the abstract method in backend.py:186, it should return bool, but this implementation returns a boto3 object (from the copy operation). This creates an inconsistency between the abstract method signature and the implementation.
| def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False): | |
| new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite) | |
| old_ob = self.get_s3_object(source, bucket_name = source_bucket) | |
| old_ob.delete() | |
| return new_ob | |
| def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False) -> bool: | |
| new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite) | |
| old_ob = self.get_s3_object(source, bucket_name = source_bucket) | |
| old_ob.delete() | |
| return True |
|
|
||
| def move(self, source, destination, source_bucket, destination_bucket, | ||
| overwrite): | ||
| def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False) -> bool: |
There was a problem hiding this comment.
The abstract method signature in backend.py:186 requires source_bucket and destination_bucket to be non-optional str parameters, but this implementation makes them Optional[str] with default value None. This violates the Liskov Substitution Principle and creates an incompatibility between the abstract base class and its implementation.
| return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata}, | ||
| Config = config) |
There was a problem hiding this comment.
The permissions parameter is Optional[str] but it's directly passed to boto3's ExtraArgs = {'ACL': permissions, ...} at line 351. If permissions is None, this will pass None as the ACL value to boto3, which may cause unexpected behavior or errors. Consider handling the None case explicitly or ensuring all callers provide a value.
| return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata}, | |
| Config = config) | |
| extra_args = {'Metadata': metadata} | |
| if permissions is not None: | |
| extra_args['ACL'] = permissions | |
| return obj.upload_fileobj(stream, ExtraArgs = extra_args, Config = config) |
| @abstractmethod | ||
| def move(self, source, destination, source_bucket, destination_bucket, overwrite): | ||
| def move(self, source: str, destination: str, source_bucket: str, destination_bucket: str, overwrite: bool) -> bool: | ||
| """Moves an object |
There was a problem hiding this comment.
The abstract method signature requires source_bucket: str and destination_bucket: str to be non-optional parameters, but the implementations in both S3Client and LocalClient make them Optional[str] with default value None. The abstract method signature should be updated to match the implementations by making these parameters Optional with default None.
| """Moves an object | |
| def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool) -> bool: |
| @abstractmethod | ||
| def list_objects(self): | ||
| def list_objects(self) -> list[str]: | ||
| """Gets all objects contained by backend |
There was a problem hiding this comment.
The abstract method specifies a return type of list[str], but the S3Client implementation at s3client.py:217 returns a boto3 response object (an iterator of S3 ObjectSummary objects), not a list of strings. The GDriveClient also returns different types depending on parameters. This creates a type mismatch between the abstract method and its implementations. The abstract method signature should either be updated to use a more general return type (like Any or a Union), or the implementations should be updated to return list[str].
| @clean_object_name | ||
| def check_file_exists(self, object_name, bucket_name=None): | ||
| def check_file_exists(self, cloud_name: str, bucket_name: Optional[str] = None) -> bool: |
There was a problem hiding this comment.
The @clean_object_name decorator expects the first parameter (after self) to be named object_name, but this method uses cloud_name. The decorator's wrapper function at utils.py:272 has a hardcoded parameter name object_name, which will cause a TypeError at runtime when the decorator attempts to call this method with object_name as a keyword argument.
Fix decorator to pass through return type
df447e2 to
2dd7993
Compare
This pull request improves type safety and code clarity across the codebase by adding or refining type annotations, using more precise types, and making some small API documentation updates.
.github/workflows/run_tests.yml, set to continue on error for now.# type: ignorecomments where necessary to silence type checker warnings for dynamic imports or assignments, and updated docstrings to match new type annotations.