Deep RL Bipedal “Walker”

https://github.com/mcdermatt/RL/tree/master/HW3

This was my first adventure into Deep-RL and proved to be quite a difficult task. Inspired by the early 2000s flash game QWOP in which a player mush mash the Q,W,O, and P buttons on their keyboard to control a runner’s left and right thigh and calf muscles respectively, I attempted to create a similar environment from scratch using the popular PyMunk physics engine. I chose this approach rather than using the existing OpenAI-Gym 2D Walker due to the fact that the gym environment had only 4 degrees of freedom and was statically stable which leads to much simpler (and less cool) solutions to this control problem.

I created a DDPG Actor-Critic network to train the agent.

I spent a considerable amount of time doing reward shaping for this problem. Setting the reward to be purely a function of horizontal distance traveled caused the agent to dive forward and faceplant, while rewarding timesteps before falling results in a stable crouched position without forward progression.

It was important to check in on the visualization of the simulation every few hundred epochs because the agent would occasionally stumble across an exploit in the simulation that would reward unrealistic behavior such as smashing a limb through the ground plane.

The highest scoring policy my network generated made use of a cartwheel behavior rather than a true walk. Originally I thought about punishing any state where the agent went upside down, however, I think a front flipping robot looks pretty cool.

Deep RL Bipedal “Walker”

Precision Manufacturing/ Fabrication

Parfocal Lens and Autofocus System