Skip to content

Conversation

@tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Jul 2, 2020

Fix #259

Not sure if this is the right fix or if we should fix this in cloudpickle to keep the former behavior?

@mgorny
Copy link
Contributor

mgorny commented Jul 2, 2020

Thank you for the quick patch.

@pierreglaser
Copy link
Collaborator

Thank you for the fix @tomMoral!

The tests are failing because
https://github.com/joblib/loky/blob/master/loky/backend/reduction.py#L181 conflicts with
https://github.com/cloudpipe/cloudpickle/blob/master/cloudpickle/cloudpickle_fast.py#L479 -
the CloudPickler._dispatch_table class attribute is overridden by CustomizablePickler._dispatch_tableand, as a consequence, the CustomizablePickler.dispatch_table ChainMap, which is essentially a view over a list of dictionaries including cls._dispatch_table loses the reducers included in CloudPickler._dispatch_table.

As a temporary fix, CustomizablePickler._dispatch_table should be renamed CustomizablePickler._loky_dispatch_table so that it does not collide with CloudPickler._dispatch_table.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 10, 2020

@pierreglaser I think this is more complex than this. I have tried many alternatives that do not override the class level _dispatch_table attribute and it still fails on Python 3.8.

Also in recent cloudpickle, dispatch_table (the ChainMap) is also a class attribute so it's actually independent of any subsequent setattr on _dispatch_table (it still holds a ref on the original CloudPickler._dispatch_table).

That being said I have no idea about the true cause of the problem on Python 3.8. Not sure how to debug this.

Also note that Cloudpickler._dispatch_table and loky's _ReducerRegistry hold reducers for methods and classmethod. I have tried to reverse the order as follows but that does not fix the problem:

diff --git a/loky/backend/reduction.py b/loky/backend/reduction.py
index 0bad5f6..6a6c3df 100644
--- a/loky/backend/reduction.py
+++ b/loky/backend/reduction.py
@@ -175,11 +175,6 @@ def set_loky_pickler(loky_pickler=None):
             # a reference to the class dictionary under Python 2
             _dispatch = loky_pickler_cls.dispatch.copy()
             _dispatch.update(_ReducerRegistry.dispatch_table)
-        else:
-            # Under Python 3 initialize the dispatch table with a copy of the
-            # default registry
-            _dispatch_table = copyreg.dispatch_table.copy()
-            _dispatch_table.update(_ReducerRegistry.dispatch_table)
 
         def __init__(self, writer, reducers=None, protocol=HIGHEST_PROTOCOL):
             loky_pickler_cls.__init__(self, writer, protocol=protocol)
@@ -187,27 +182,21 @@ def set_loky_pickler(loky_pickler=None):
                 reducers = {}
             if sys.version_info < (3,):
                 self.dispatch = self._dispatch.copy()
-            else:
-                if getattr(self, "dispatch_table", None) is not None:
-                    self.dispatch_table.update(self._dispatch_table.copy())
-                else:
-                    self.dispatch_table = self._dispatch_table.copy()
-
-            for type, reduce_func in reducers.items():
-                self.register(type, reduce_func)
-
-        def register(self, type, reduce_func):
-            """Attach a reducer function to a given type in the dispatch table.
-            """
-            if sys.version_info < (3,):
-                # Python 2 pickler dispatching is not explicitly customizable.
-                # Let us use a closure to workaround this limitation.
+                for type, reduce_func in reducers.items():
                     def dispatcher(self, obj):
                         reduced = reduce_func(obj)
                         self.save_reduce(obj=obj, *reduced)
                     self.dispatch[type] = dispatcher
             else:
-                self.dispatch_table[type] = reduce_func
+                new_dispatch_table = copyreg.dispatch_table.copy()
+                new_dispatch_table.update(_ReducerRegistry.dispatch_table)
+                if hasattr(self, "dispatch_table"):
+                    # Recent versions of CloudPickler.dispatch_table provide a
+                    # different way to pickle method and classmethod
+                    new_dispatch_table.update(self.dispatch_table)
+                self.dispatch_table = new_dispatch_table
+                for type, reduce_func in reducers.items():
+                    self.dispatch_table[type] = reduce_func
 
     _LokyPickler = CustomizablePickler
     _loky_pickler_name = loky_pickler

but I still get (on Python 3.8):

E               Traceback (most recent call last):
E                 File "/home/ogrisel/code/loky/loky/process_executor.py", line 404, in _process_worker
E                   call_item = call_queue.get(block=True, timeout=timeout)
E                 File "/home/ogrisel/miniconda3/envs/pylatest/lib/python3.8/multiprocessing/queues.py", line 111, in get
E                   res = self._recv_bytes()
E                 File "/home/ogrisel/miniconda3/envs/pylatest/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
E                   buf = self._recv_bytes(maxlength)
E                 File "/home/ogrisel/miniconda3/envs/pylatest/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
E                   buf = self._recv(4)
E                 File "/home/ogrisel/miniconda3/envs/pylatest/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
E                   chunk = read(handle, remaining)
E               OSError: [Errno 9] Bad file descriptor

@ogrisel
Copy link
Contributor

ogrisel commented Sep 10, 2020

Also I get the same error when I comment out the line that does:

new_dispatch_table.update(self.dispatch_table)

@pierreglaser
Copy link
Collaborator

pierreglaser commented Sep 22, 2020

So after IRL debugging with @ogrisel yesterday and a bit more investigation, I think I went to the bottom of this.

The reason the tests are failing right now is due to a very subtle side effect in _pickle.c. Specifically, it is due to this line:
in Pickler___init__:

    if (_PyObject_LookupAttrId((PyObject *)self,
                                    &PyId_dispatch_table, &self->dispatch_table) < 0) {
        return -1;

Side note: recall that the _pickle.Pickler class (builtin) exposes a dispatch_table member descriptor. Member descriptors allow Python users to access struct-level attributes of Python builtin-types.
In our case, dispatch_table is a member of the PicklerObject struct, and accessing this member for a specific instance is provided by the class-level dispatch_table member descriptor.

Now, during Pickler___init__, the interpreter does an "dispatch_table" attribute access attempt on the current Pickler instance through the _PyObject_LookupAttrId function call, which tries to find an attributed named "dispatch_table" using every plausible location (instance-level attribute, parent class attribute, member_descriptors...). The dispatch_table attribute with the highest priority (if any) is then used to set the value of the dispatch_table member of the current PicklerObject struct.

During pickling, the dispatch_table attribute that will be looked for is the dispatch_table member of the Pickler struct, see this line. So if one defines

  • either a class attribute named dispatch_table present in the __dict__ of any of the current Pickler instance subclasses
  • or an instance-level attribute named dispatch_table in the Pickler instance __dict__ (standard way of assigning dynamic attributes)

after after calling Pickler___init__, this attribute will not be used during subsequent pickler.dump calls. Indeed, setting class attributes, or dynamic attribute in an object's __dict__ does not try to set any member of the current instance's PyObject struct, so such attributes will not be used during pickling.

In this specific pull request, is that precise behavior that makes loky crash:

during CustomizablePickler's __init__, this new line:

self.dispatch_table = self.dispatch_table.copy()

is an attempt to redefine dispatch_table at the instance level of the current Pickler instance. However, a previous line

loky_pickler_cls.__init__(self, writer, protocol=protocol)

triggers a call to Cloudpickler.__init__, which calls Pickler.__init__. At this point, only CloudPickler.dispatch_table exists (the self.dispatch_table = self.dispatch_table.copy() line has not been executed yet), and thus Pickler->dispatch_table is set using CloudPickler.dispatch_table.

Now: the subsequent self.dispatch_table = self.dispatch_table.copy() triggers a setattr attempt. The interaction between CloudPickler's dispatch_table and Pickler.dispatch_table, makes Python decides that the outcome of this setattr attemp should be the creation of a new dynamic attribute in the instance __dict__, named dispatch_table. See _PyObject_GenericSetAttrWithDict to understand why this is the case.

Because of the reasons stated in the paragraph starting with "[During pickling...]", this attribute is not used during pickling, which is the reason why the test crashes.

Re-assignment of the Pickler's dispatch_table struct member however can be done using the following syntax:

Pickler.dispatch_table.__set__(pickler_instance, new_dispatch_table)

The classic dotted setattr-like syntax

pickler_instance.dispatch_table = new_dispatch_table

will modify the dispatch_table struct member only if the dispatch_table member descriptor is the highest-priority attribute named "dispatch_table" of pickler_instance.

ping @ogrisel and @tomMoral

@ogrisel
Copy link
Contributor

ogrisel commented Sep 22, 2020

What I do not understand is that, if you do not define dispatch_table and the class level you can do several consecutive pickler.dispatch_table = some_dict and they will always update the member and never end-up in the pickler.__dict__.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 22, 2020

Ok the problem is that any CloudPIicklerSubclass.dispatch_table class-attributes shadows the CloudPIickler.dispatch_table member descriptors and threfore, the member descriptor is never accessed when calling setattr on the instance level.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 22, 2020

To me this sounds like a design problem in the Pickler class. I think we should report upstream to CPython developers (even if we have a workaround by not definining CloudPickler.dispatch_table and initializing it in the __init__ as done in cloudpipe/cloudpickle#395.

@ogrisel
Copy link
Contributor

ogrisel commented Sep 24, 2020

Closing in favor of #272 which takes very subtle implementation details into account...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestsProcessPool*Executor::test_serialization fails

4 participants