ENH add type annotations (old) by kroq-gar78 · Pull Request #114 · gallantlab/cottoncandy

kroq-gar78 · 2026-02-24T08:08:12Z

This pull request improves type safety and code clarity across the codebase by adding or refining type annotations, using more precise types, and making some small API documentation updates.

Added type annotations for function signatures, variables, and return types for public and private APIs.
Added a mypy type checking step to the GitHub Actions workflow in .github/workflows/run_tests.yml, set to continue on error for now.
Added # type: ignore comments where necessary to silence type checker warnings for dynamic imports or assignments, and updated docstrings to match new type annotations.

Copilot

Pull request overview

This pull request adds comprehensive type annotations to the cottoncandy codebase to improve type safety and code clarity. The changes include refactoring CloudStream from a class to a NamedTuple, updating the Python minimum version requirement from 3.7 to 3.9, and adding mypy type checking to the CI/CD pipeline.

Changes:

Added type annotations for function signatures, parameters, return types, and variables throughout the codebase
Refactored CloudStream from a class to a NamedTuple for improved immutability and type safety
Updated Python minimum version from >=3.7 to >=3.9 to support modern type annotation syntax (PEP 585)
Added mypy type checking to GitHub Actions workflow (with continue-on-error flag)
Added py.typed marker file for PEP 561 compliance
Fixed several bugs including incorrect raise statements, missing return statements, and missing else clauses

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
pyproject.toml	Updated Python version requirement to >=3.9, added mypy and ipython test dependencies, added py.typed to package data
mypy.ini	Added mypy configuration with allow_redefinition and disabled import-untyped error
.github/workflows/run_tests.yml	Added mypy type checking step (continue-on-error)
cottoncandy/py.typed	Added PEP 561 marker file for type checking support
cottoncandy/backend.py	Refactored CloudStream to NamedTuple, added type annotations to abstract methods
cottoncandy/utils.py	Added comprehensive type annotations, fixed raise statement, added return statement to seek()
cottoncandy/interfaces.py	Added type annotations throughout, fixed missing return in cloud2dict, added else clause for sparse array types
cottoncandy/init.py	Added type annotations to module functions and variables
cottoncandy/s3client.py	Added type annotations to S3 backend methods
cottoncandy/localclient.py	Added type annotations to local filesystem backend methods
cottoncandy/gdriveclient.py	Added type ignore comments for dynamic IPython imports
cottoncandy/browser.py	Added type ignore comment for duplicate dir method
cottoncandy/tests/*.py	Added type annotations for test fixtures and variables
cottoncandy/tests/conftest.py	Added else clause to raise error for invalid client types

Comments suppressed due to low confidence (2)

cottoncandy/utils.py:499

GzipInputStream now inherits from BytesIO but doesn't call super().__init__() in its __init__ method. This means the BytesIO parent class is never properly initialized, which could lead to issues if any BytesIO methods rely on the parent's initialization. Consider calling super().__init__() or reconsidering whether BytesIO inheritance is necessary here.

class GzipInputStream(BytesIO):
    """Simple class that allow streaming reads from GZip files
    (from https://gist.github.com/beaufour/4205533).

    Python 2.x gzip.GZipFile relies on .seek() and .tell(), so it
    doesn't support this (@see: http://bo4.me/YKWSsL).

    Adapted from: http://effbot.org/librarybook/zlib-example-4.py
    """

    def __init__(self, fileobj: BinaryIO, block_size: int=16384):
        """
        Initialize with the given file-like object.

        @param fileobj: file-like object,
        """
        self.BLOCK_SIZE = block_size                       # Read block size
        # zlib window buffer size, set to gzip's format
        self.WINDOW_BUFFER_SIZE = 16 + zlib.MAX_WBITS

        self._file = fileobj
        self._zip = zlib.decompressobj(self.WINDOW_BUFFER_SIZE)
        self._offset = 0  # position in unzipped stream
        self._data = b''

cottoncandy/interfaces.py:630

Since the minimum Python version is now 3.9, the six.PY2 check will always be False and the six.PY3 check will always be True. This code can be simplified by removing the PY2 branch and the PY3 conditionals. Lines 597-598 could be simplified to just check not array.flags['C_CONTIGUOUS'], and lines 601-604 and 626-630 no longer need the six.PY3 conditions.

        order: Literal['C', 'F'] = 'C' if array.flags.carray else 'F'
        if ((not array.flags['%s_CONTIGUOUS' % order] and six.PY2) or  # type: ignore
                (not array.flags['C_CONTIGUOUS'] and six.PY3)):
            warn('Non-contiguous array. Creating copy (will use extra memory)...')

            if six.PY3 and order == 'F':
                # memoryview (PY3) vs buffer (PY2) issues
                warn("PY3: Changing array from 'F' to 'C' order")
                order = 'C'

            # create contiguous copy
            array = np.array(array, order=order)

        meta = dict(dtype=array.dtype.str,
                    shape=','.join(map(str, array.shape)),
                    compression=str(compression),
                    order=order)

        # check for conflicts in metadata
        metadata_keys = []
        for k in metadata.keys():
            # check for conflicts in metadata
            metadata_keys.append(k in meta)

        assert not any(metadata_keys)
        meta.update(metadata)

        if compression is False:
            filestream = StringIO(array.data)
        elif compression == 'gzip':
            if six.PY3 and array.flags['F_CONTIGUOUS']:
                # eventually, array.data below should be changed to np.getbuffer(array)
                # (not yet working in python3 numpy)
                # F-contiguous arrays break gzip in python 3
                array = array.T

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T08:14:57Z

@@ -397,24 +401,25 @@ def upload_file(self, file_name, cloud_name=None, permissions=DEFAULT_ACL, threa
                                multipart_threshold = MPU_THRESHOLD)
        return s3_object.upload_file(file_name, ExtraArgs={'ACL': permissions}, Config = config)


The permissions parameter defaults to None, but it's directly passed to boto3's ExtraArgs={'ACL': permissions} at line 402. If permissions is None, this will pass None as the ACL value to boto3, which may cause unexpected behavior or errors. Consider adding a default value check or ensuring callers always provide a value.

Copilot · 2026-02-24T08:14:57Z

    if six.PY3:
        if to.flags['F_CONTIGUOUS']:
            vw = to.T.view()
        else:
            vw = to.view()
        vw.shape = (-1,)                # Must be a ravel-able object
-        vw.dtype = np.dtype('uint8')    # 256 values in each byte
+        # 256 values in each byte
+        vw.dtype = np.dtype('uint8')    # type: ignore[misc]

    for ci in range(int(np.ceil(nbytes_total / float(buffersize)))):
        start = ci * buffersize
        end = min(nbytes_total, (ci + 1) * buffersize)
        if six.PY2:
            to.data[start:end] = frm.read(end - start)
        elif six.PY3:
            vw.data[start:end] = frm.read(end - start)
        else:
-            raise("Unknown python version") # not sure six will ever do anything here (6=2x3)
+            raise Exception("Unknown python version") # not sure six will ever do anything here (6=2x3)



The Python minimum version has been bumped to 3.9, but this code still checks for six.PY2 which will never be True. Since Python 2 is no longer supported, the six.PY2 branch at lines 468-469 is now dead code and should be removed. The entire function could be simplified to only handle Python 3.

Copilot · 2026-02-24T08:14:58Z

        return os.path.isfile(os.path.join(bucket_name, cloud_name))

-    def upload_stream(self, stream, cloud_name, metadata, permissions, threads = 1):
+    def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None:


The metadata parameter is typed as dict[str, str] in this implementation, but the abstract method in backend.py:51 specifies just dict. While this is not strictly wrong (a more specific type for an implementation is acceptable), it would be better for consistency if the abstract method also specified dict[str, str] to match the implementations and the CloudStream metadata field type.

Suggested change

def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None:

def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict, permissions: Optional[str] = None, threads: int = 1) -> None:

Copilot · 2026-02-24T08:14:58Z

+    def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False):
        new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite)
        old_ob = self.get_s3_object(source, bucket_name = source_bucket)
        old_ob.delete()
        return new_ob


The move method is missing a return type annotation. According to the abstract method in backend.py:186, it should return bool, but this implementation returns a boto3 object (from the copy operation). This creates an inconsistency between the abstract method signature and the implementation.

Suggested change

def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False):

new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite)

old_ob = self.get_s3_object(source, bucket_name = source_bucket)

old_ob.delete()

return new_ob

def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False) -> bool:

new_ob = self.copy(source, destination, source_bucket, destination_bucket, overwrite)

old_ob = self.get_s3_object(source, bucket_name = source_bucket)

old_ob.delete()

return True

Copilot · 2026-02-24T08:14:58Z


-    def move(self, source, destination, source_bucket, destination_bucket,
-             overwrite):
+    def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool = False) -> bool:


The abstract method signature in backend.py:186 requires source_bucket and destination_bucket to be non-optional str parameters, but this implementation makes them Optional[str] with default value None. This violates the Liskov Substitution Principle and creates an incompatibility between the abstract base class and its implementation.

Copilot · 2026-02-24T08:14:59Z

        return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata},
                                                Config = config)


The permissions parameter is Optional[str] but it's directly passed to boto3's ExtraArgs = {'ACL': permissions, ...} at line 351. If permissions is None, this will pass None as the ACL value to boto3, which may cause unexpected behavior or errors. Consider handling the None case explicitly or ensuring all callers provide a value.

Suggested change

return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata},

Config = config)

extra_args = {'Metadata': metadata}

if permissions is not None:

extra_args['ACL'] = permissions

return obj.upload_fileobj(stream, ExtraArgs = extra_args, Config = config)

Copilot · 2026-02-24T08:14:59Z

    @abstractmethod
-    def move(self, source, destination, source_bucket, destination_bucket, overwrite):
+    def move(self, source: str, destination: str, source_bucket: str, destination_bucket: str, overwrite: bool) -> bool:
        """Moves an object


The abstract method signature requires source_bucket: str and destination_bucket: str to be non-optional parameters, but the implementations in both S3Client and LocalClient make them Optional[str] with default value None. The abstract method signature should be updated to match the implementations by making these parameters Optional with default None.

Suggested change

"""Moves an object

def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool) -> bool:

Copilot · 2026-02-24T08:14:59Z

    @abstractmethod
-    def list_objects(self):
+    def list_objects(self) -> list[str]:
        """Gets all objects contained by backend


The abstract method specifies a return type of list[str], but the S3Client implementation at s3client.py:217 returns a boto3 response object (an iterator of S3 ObjectSummary objects), not a list of strings. The GDriveClient also returns different types depending on parameters. This creates a type mismatch between the abstract method and its implementations. The abstract method signature should either be updated to use a more general return type (like Any or a Union), or the implementations should be updated to return list[str].

Copilot · 2026-02-24T08:14:59Z

    @clean_object_name
-    def check_file_exists(self, object_name, bucket_name=None):
+    def check_file_exists(self, cloud_name: str, bucket_name: Optional[str] = None) -> bool:


The @clean_object_name decorator expects the first parameter (after self) to be named object_name, but this method uses cloud_name. The decorator's wrapper function at utils.py:272 has a hardcoded parameter name object_name, which will cause a TypeError at runtime when the decorator attempts to call this method with object_name as a keyword argument.

Fix decorator to pass through return type

…g kwargs

kroq-gar78 requested a review from Copilot February 24, 2026 08:08

Copilot started reviewing on behalf of kroq-gar78 February 24, 2026 08:08 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

kroq-gar78 added 22 commits March 28, 2026 17:31

Fix docstring

75e464b

Add py.typed file to enable type checking from other packages

8bf8134

CI add mypy type checking

0218ec0

Add types to base module file

e7bcbf5

Add annotations in ArrayInterface, FileSystemInterface

6c1d7ba

DefaultInterface has the actual interface methods

35aace4

typing: ignore old IPython APIs

333c7ad

Fix mypy type checking ambiguities

973abc7

Remove support for EOL python versions

20bdf94

Add a few more annotations

c4632ac

Fix decorator to pass through return type

Add types to backend classes

3fc2522

Type annotations in tests

a5b9565

Make GzipInputStream implement buffered reader

184382a

Type annotation using monkeytype

1022ea9

Fix localclient.move return type

6eadc45

bytes compatibility

5a63545

More annotations. Create TypedDict for metadata

c6ce60c

Add type assertions and coercions

52bf401

Typing: ignore some errors mypy can't infer

3faf97d

Unused imports

cb8574a

Consistent types

f2049f4

MNT move mypy config to pyproject.toml

17ec4b8

kroq-gar78 marked this pull request as draft March 29, 2026 00:34

kroq-gar78 changed the title ~~Add type annotations~~ ENH add type annotations Mar 29, 2026

kroq-gar78 added 3 commits April 6, 2026 02:53

Add types to the rest of interfaces.py. ALSO pass through some missin…

49267dd

…g kwargs

Clean up CCBackEnd ABC types

b9d6645

Address Copilot comment (S3 default permissions)

090e40d

Consistent arg name (object_name --> cloud_name)

2dd7993

kroq-gar78 force-pushed the frontend-types branch from df447e2 to 2dd7993 Compare April 6, 2026 10:38

kroq-gar78 changed the title ~~ENH add type annotations~~ ENH add type annotations (old) Apr 7, 2026

Fix some mypy/pyright errors

3ff2060

kroq-gar78 mentioned this pull request Apr 14, 2026

ENH add type annotations #119

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH add type annotations (old)#114

ENH add type annotations (old)#114
kroq-gar78 wants to merge 27 commits intogallantlab:mainfrom
kroq-gar78:frontend-types

kroq-gar78 commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -397,24 +401,25 @@ def upload_file(self, file_name, cloud_name=None, permissions=DEFAULT_ACL, threa
		multipart_threshold = MPU_THRESHOLD)
		return s3_object.upload_file(file_name, ExtraArgs={'ACL': permissions}, Config = config)

	def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict[str, str], permissions: Optional[str] = None, threads: int = 1) -> None:
	def upload_stream(self, stream: BinaryIO, cloud_name: str, metadata: dict, permissions: Optional[str] = None, threads: int = 1) -> None:

		return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata},
		Config = config)

-        return obj.upload_fileobj(stream, ExtraArgs = {'ACL': permissions, 'Metadata': metadata},
-                                                Config = config)
+        extra_args = {'Metadata': metadata}
+        if permissions is not None:
+            extra_args['ACL'] = permissions
+        return obj.upload_fileobj(stream, ExtraArgs = extra_args, Config = config)

	"""Moves an object
	def move(self, source: str, destination: str, source_bucket: Optional[str] = None, destination_bucket: Optional[str] = None, overwrite: bool) -> bool:

Conversation

kroq-gar78 commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants