Skip to content

ENH add type annotations (old)#114

Draft
kroq-gar78 wants to merge 27 commits intogallantlab:mainfrom
kroq-gar78:frontend-types
Draft

ENH add type annotations (old)#114
kroq-gar78 wants to merge 27 commits intogallantlab:mainfrom
kroq-gar78:frontend-types

Conversation

@kroq-gar78
Copy link
Copy Markdown
Collaborator

This pull request improves type safety and code clarity across the codebase by adding or refining type annotations, using more precise types, and making some small API documentation updates.

  • Added type annotations for function signatures, variables, and return types for public and private APIs.
  • Added a mypy type checking step to the GitHub Actions workflow in .github/workflows/run_tests.yml, set to continue on error for now.
  • Added # type: ignore comments where necessary to silence type checker warnings for dynamic imports or assignments, and updated docstrings to match new type annotations.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive type annotations to the cottoncandy codebase to improve type safety and code clarity. The changes include refactoring CloudStream from a class to a NamedTuple, updating the Python minimum version requirement from 3.7 to 3.9, and adding mypy type checking to the CI/CD pipeline.

Changes:

  • Added type annotations for function signatures, parameters, return types, and variables throughout the codebase
  • Refactored CloudStream from a class to a NamedTuple for improved immutability and type safety
  • Updated Python minimum version from >=3.7 to >=3.9 to support modern type annotation syntax (PEP 585)
  • Added mypy type checking to GitHub Actions workflow (with continue-on-error flag)
  • Added py.typed marker file for PEP 561 compliance
  • Fixed several bugs including incorrect raise statements, missing return statements, and missing else clauses

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pyproject.toml Updated Python version requirement to >=3.9, added mypy and ipython test dependencies, added py.typed to package data
mypy.ini Added mypy configuration with allow_redefinition and disabled import-untyped error
.github/workflows/run_tests.yml Added mypy type checking step (continue-on-error)
cottoncandy/py.typed Added PEP 561 marker file for type checking support
cottoncandy/backend.py Refactored CloudStream to NamedTuple, added type annotations to abstract methods
cottoncandy/utils.py Added comprehensive type annotations, fixed raise statement, added return statement to seek()
cottoncandy/interfaces.py Added type annotations throughout, fixed missing return in cloud2dict, added else clause for sparse array types
cottoncandy/init.py Added type annotations to module functions and variables
cottoncandy/s3client.py Added type annotations to S3 backend methods
cottoncandy/localclient.py Added type annotations to local filesystem backend methods
cottoncandy/gdriveclient.py Added type ignore comments for dynamic IPython imports
cottoncandy/browser.py Added type ignore comment for duplicate dir method
cottoncandy/tests/*.py Added type annotations for test fixtures and variables
cottoncandy/tests/conftest.py Added else clause to raise error for invalid client types
Comments suppressed due to low confidence (2)

cottoncandy/utils.py:499

  • GzipInputStream now inherits from BytesIO but doesn't call super().__init__() in its __init__ method. This means the BytesIO parent class is never properly initialized, which could lead to issues if any BytesIO methods rely on the parent's initialization. Consider calling super().__init__() or reconsidering whether BytesIO inheritance is necessary here.
class GzipInputStream(BytesIO):
    """Simple class that allow streaming reads from GZip files
    (from https://gist.github.com/beaufour/4205533).

    Python 2.x gzip.GZipFile relies on .seek() and .tell(), so it
    doesn't support this (@see: http://bo4.me/YKWSsL).

    Adapted from: http://effbot.org/librarybook/zlib-example-4.py
    """

    def __init__(self, fileobj: BinaryIO, block_size: int=16384):
        """
        Initialize with the given file-like object.

        @param fileobj: file-like object,
        """
        self.BLOCK_SIZE = block_size                       # Read block size
        # zlib window buffer size, set to gzip's format
        self.WINDOW_BUFFER_SIZE = 16 + zlib.MAX_WBITS

        self._file = fileobj
        self._zip = zlib.decompressobj(self.WINDOW_BUFFER_SIZE)
        self._offset = 0  # position in unzipped stream
        self._data = b''

cottoncandy/interfaces.py:630

  • Since the minimum Python version is now 3.9, the six.PY2 check will always be False and the six.PY3 check will always be True. This code can be simplified by removing the PY2 branch and the PY3 conditionals. Lines 597-598 could be simplified to just check not array.flags['C_CONTIGUOUS'], and lines 601-604 and 626-630 no longer need the six.PY3 conditions.
        order: Literal['C', 'F'] = 'C' if array.flags.carray else 'F'
        if ((not array.flags['%s_CONTIGUOUS' % order] and six.PY2) or  # type: ignore
                (not array.flags['C_CONTIGUOUS'] and six.PY3)):
            warn('Non-contiguous array. Creating copy (will use extra memory)...')

            if six.PY3 and order == 'F':
                # memoryview (PY3) vs buffer (PY2) issues
                warn("PY3: Changing array from 'F' to 'C' order")
                order = 'C'

            # create contiguous copy
            array = np.array(array, order=order)

        meta = dict(dtype=array.dtype.str,
                    shape=','.join(map(str, array.shape)),
                    compression=str(compression),
                    order=order)

        # check for conflicts in metadata
        metadata_keys = []
        for k in metadata.keys():
            # check for conflicts in metadata
            metadata_keys.append(k in meta)

        assert not any(metadata_keys)
        meta.update(metadata)

        if compression is False:
            filestream = StringIO(array.data)
        elif compression == 'gzip':
            if six.PY3 and array.flags['F_CONTIGUOUS']:
                # eventually, array.data below should be changed to np.getbuffer(array)
                # (not yet working in python3 numpy)
                # F-contiguous arrays break gzip in python 3
                array = array.T

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cottoncandy/s3client.py
Comment on lines 378 to 402
@@ -397,24 +401,25 @@ def upload_file(self, file_name, cloud_name=None, permissions=DEFAULT_ACL, threa
multipart_threshold = MPU_THRESHOLD)
return s3_object.upload_file(file_name, ExtraArgs={'ACL': permissions}, Config = config)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The permissions parameter defaults to None, but it's directly passed to boto3's ExtraArgs={'ACL': permissions} at line 402. If permissions is None, this will pass None as the ACL value to boto3, which may cause unexpected behavior or errors. Consider adding a default value check or ensuring callers always provide a value.

Copilot uses AI. Check for mistakes.
Comment thread cottoncandy/utils.py
Comment on lines 455 to 473
if six.PY3:
if to.flags['F_CONTIGUOUS']:
vw = to.T.view()
else:
vw = to.view()
vw.shape = (-1,) # Must be a ravel-able object
vw.dtype = np.dtype('uint8') # 256 values in each byte
# 256 values in each byte
vw.dtype = np.dtype('uint8') # type: ignore[misc]

for ci in range(int(np.ceil(nbytes_total / float(buffersize)))):
start = ci * buffersize
end = min(nbytes_total, (ci + 1) * buffersize)
if six.PY2:
to.data[start:end] = frm.read(end - start)
elif six.PY3:
vw.data[start:end] = frm.read(end - start)
else:
raise("Unknown python version") # not sure six will ever do anything here (6=2x3)
raise Exception("Unknown python version") # not sure six will ever do anything here (6=2x3)

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python minimum version has been bumped to 3.9, but this code still checks for six.PY2 which will never be True. Since Python 2 is no longer supported, the six.PY2 branch at lines 468-469 is now dead code and should be removed. The entire function could be simplified to only handle Python 3.

Copilot uses AI. Check for mistakes.
return os.path.isfile(os.path.join(bucket_name, cloud_name))

def upload_stream(self, stream, cloud_name, metadata, permissions, threads = 1):
def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None:
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata parameter is typed as dict[str, str] in this implementation, but the abstract method in backend.py:51 specifies just dict. While this is not strictly wrong (a more specific type for an implementation is acceptable), it would be better for consistency if the abstract method also specified dict[str, str] to match the implementations and the CloudStream metadata field type.

Suggested change
def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None:
def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict, permissions: Optional[str] = None, threads: int = 1) -> None:

Copilot uses AI. Check for mistakes.
Comment thread cottoncandy/s3client.py
Comment thread cottoncandy/s3client.py
Comment on lines +436 to 440
def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False):
new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite)
old_ob = self.get_s3_object(source, bucket_name = source_bucket)
old_ob.delete()
return new_ob
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The move method is missing a return type annotation. According to the abstract method in backend.py:186, it should return bool, but this implementation returns a boto3 object (from the copy operation). This creates an inconsistency between the abstract method signature and the implementation.

Suggested change
def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False):
new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite)
old_ob = self.get_s3_object(source, bucket_name = source_bucket)
old_ob.delete()
return new_ob
def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False) -> bool:
new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite)
old_ob = self.get_s3_object(source, bucket_name = source_bucket)
old_ob.delete()
return True

Copilot uses AI. Check for mistakes.

def move(self, source, destination, source_bucket, destination_bucket,
overwrite):
def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False) -> bool:
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract method signature in backend.py:186 requires source_bucket and destination_bucket to be non-optional str parameters, but this implementation makes them Optional[str] with default value None. This violates the Liskov Substitution Principle and creates an incompatibility between the abstract base class and its implementation.

Copilot uses AI. Check for mistakes.
Comment thread cottoncandy/s3client.py
Comment on lines 351 to 352
return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata},
Config = config)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The permissions parameter is Optional[str] but it's directly passed to boto3's ExtraArgs = {'ACL': permissions, ...} at line 351. If permissions is None, this will pass None as the ACL value to boto3, which may cause unexpected behavior or errors. Consider handling the None case explicitly or ensuring all callers provide a value.

Suggested change
return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata},
Config = config)
extra_args = {'Metadata': metadata}
if permissions is not None:
extra_args['ACL'] = permissions
return obj.upload_fileobj(stream, ExtraArgs = extra_args, Config = config)

Copilot uses AI. Check for mistakes.
Comment thread cottoncandy/backend.py
@abstractmethod
def move(self, source, destination, source_bucket, destination_bucket, overwrite):
def move(self, source: str, destination: str, source_bucket: str, destination_bucket: str, overwrite: bool) -> bool:
"""Moves an object
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract method signature requires source_bucket: str and destination_bucket: str to be non-optional parameters, but the implementations in both S3Client and LocalClient make them Optional[str] with default value None. The abstract method signature should be updated to match the implementations by making these parameters Optional with default None.

Suggested change
"""Moves an object
def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool) -> bool:

Copilot uses AI. Check for mistakes.
Comment thread cottoncandy/backend.py
@abstractmethod
def list_objects(self):
def list_objects(self) -> list[str]:
"""Gets all objects contained by backend
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract method specifies a return type of list[str], but the S3Client implementation at s3client.py:217 returns a boto3 response object (an iterator of S3 ObjectSummary objects), not a list of strings. The GDriveClient also returns different types depending on parameters. This creates a type mismatch between the abstract method and its implementations. The abstract method signature should either be updated to use a more general return type (like Any or a Union), or the implementations should be updated to return list[str].

Copilot uses AI. Check for mistakes.
Comment thread cottoncandy/s3client.py
Comment on lines 123 to +124
@clean_object_name
def check_file_exists(self, object_name, bucket_name=None):
def check_file_exists(self, cloud_name: str, bucket_name: Optional[str] = None) -> bool:
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @clean_object_name decorator expects the first parameter (after self) to be named object_name, but this method uses cloud_name. The decorator's wrapper function at utils.py:272 has a hardcoded parameter name object_name, which will cause a TypeError at runtime when the decorator attempts to call this method with object_name as a keyword argument.

Copilot uses AI. Check for mistakes.
@kroq-gar78 kroq-gar78 marked this pull request as draft March 29, 2026 00:34
@kroq-gar78 kroq-gar78 changed the title Add type annotations ENH add type annotations Mar 29, 2026
@kroq-gar78 kroq-gar78 changed the title ENH add type annotations ENH add type annotations (old) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants