Skip to content

Conversation

aebrahim
Copy link

We keep the same 32 digit string hexadecimal format.

Fixes #510.
Relevant to #453 and apache/beam#21298

We keep the same 32 digit string hexadecimal format.

Fixes cloudpipe#510.
Relevant to cloudpipe#453 and apache/beam#21298
@claudevdm
Copy link

Hi @aebrahim , this approach leads to errors in Apache Beam.

For example, in Apache beam

  1. cloudpickle is used to serialize dynamic types during pipeline submission time (could be on a user's workstation)
  2. distributed workers also use cloudpickle to serialize dynamic types

During step (1) some user defined type will claim the tracking id 1
During step (2) the worker imports cloudpickle library and serializes some dynamic type, but it has no idea that tracking id 1 was already claimed in a separate python process. Now there are two types encoded with the tracking id 1.

@aebrahim
Copy link
Author

aebrahim commented Jun 3, 2025

Got it, so the approach taken needs to be some kind of deterministic hash?

@claudevdm
Copy link

Yes. Perhaps there are ways to strike a balance here depending on the use case

  • The user of cloudpickle can pass their own "generate id" function to be used when pickling a dynamic type, while the default stays uuid (which is the proven way to guarantee isinstance semantics)
  • There could be an option to forego isinstance semantics completely (dont use tracking id's at all and always create a new type) when determinism is preferred vs isinstance semantics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random class_tracker_id for dynamic class

2 participants