-
-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Hello,
When I shelved my massive, broken cleanup PR, I set out to try again, with smaller steps, more commits, more testing, to make sure I didn't break anything.
My very first move was to simply remove the NuGet dependencies within the library - to strictly make them reference each other as projects and allow more tight coupling and rapid development. After doing so, I observed an extremely sharp decline in the ability for the CartPole-v1 example to "learn" and make meaningful progress, even with the exact same hyperparameters. I am unsure exactly which version of RLMatrix this regression became noticeable. My memory is a little foggy because I performed this test several weeks ago, but I believe as the current master branch code references a 0.4.x version, I simply randomly chose an older version, 0.2.0, and this issue went away.
I wrote a very simple example demonstrating this issue on the nouveau-2.0 of my fork of RLMatrix. A video demonstrating the situation can be found below. Simply changing Old to false in CartPole-v1.csproj creates a stark contrast in performance. I would be happy to provide relevant logs or other debugging information if needed, finding details that could explain this is a little out of my range of expertise.
2025-03-18.19-54-57-00.00.00.000-00.01.40.967.mp4
There absolutely could be something I'm missing here! I'm hoping that maybe I'm misusing the newer code in some way that makes it inconsistent with the older code.