A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent.

Organization

Student

xiaohong ji

Mentors

  • Manish Kumar
  • Marcus Edel
close

2019