Could be a bug with timestep embeddings for the MidU reward model
- Currently it is taking only denoised images, in the future we want to try noisy images so that it can give reward for in-progress generations (steering in different stages could help in different ways)
- Have to walk before we can run so for now just working with purely denoised images
- Time steps need to reflect that the images are entirely denoised
- Currently not sure if timestep input is signifying pure signal or pure noise (using a value of 0)
- Need to get a better dataset to test on
- To validate the other code is working, will need to get good results on said dataset with CLIP RM model first
Could be a bug with timestep embeddings for the MidU reward model