Hey Authors,
I was playing around with some params of CALM and was wondering if you have any insight into passing previous actions as observations to the policy.
Other work similar to CALM claim that passing previous actions and states as observations to the policy reduce vibrations and other higher order behaviors in the policy.
Could it be that the 64D latent representation of the reference motion passed has enough signal which when coupled with the obs history is sufficient for the network to learn the same things it would have with the previous action history.
What are your thoughts on this one?
Thanks!