First draft of intro paragraph on DDPG and DQNs by JamesMBartlett · Pull Request #15 · mlberkeley/openbrain

JamesMBartlett · 2016-09-22T18:31:39Z

No description provided.

MadcowD

Looks good! Just want some more detail.

MadcowD · 2016-09-22T21:05:14Z

docs/arxiv/empirical/main.tex


 \section{Introduction}
 \todo[inline]{Introduction to DDPG and recent advances in deep RL. }
+[INSERT OPENING SENTENCE HERE] The current state-of-the-art in deep reinforcement learning is the Deep Deterministic Policy Gradient (DDPG) algorithm [\cite{lillicrap2015ddpg}] which expanded the deterministic policy gradient algorithm [\cite{silver2014dpg}] to continuous, high dimensional action spaces, with much success. The basic idea of DDPG is to use an actor-critic algorithm based on the DPG algorithm, where the critic $Q(s, a)$ is learned as in deep Q network learning [\cite{mnih2013dqn}], which is a model-free learning regime, and the actor $\mu(s)$ is updated based on sampling the policy gradient from [\cite{silver2014dpg}]. This algorithm had success comparable to planning based solvers on many physical control problems. 


This looks good, but I would then add a possibly second paragraph describing the downsides of this algorithm. *We need to motivate the rest of the paper! * Could be about: 1. Divergence, 2. Hyper parameter instability ( some $\gamma$s work and others do not, paradigmatically the method requires a lot of tuning, obviously you need to cite evidence for this argument). 3. The replay buffer is hacky, try and deconstruct the reasons for why it's use is essential for the DDPG algorithm

MadcowD · 2016-10-08T17:38:27Z

@JamesMBartlett Any updates on this?

First draft of intro paragraph on DDPG and DQNs

372065d

MadcowD requested changes Sep 22, 2016

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First draft of intro paragraph on DDPG and DQNs#15

First draft of intro paragraph on DDPG and DQNs#15
JamesMBartlett wants to merge 1 commit intomlberkeley:masterfrom
JamesMBartlett:introparagraph

JamesMBartlett commented Sep 22, 2016

Uh oh!

MadcowD left a comment

Uh oh!

MadcowD Sep 22, 2016

Uh oh!

MadcowD commented Oct 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

JamesMBartlett commented Sep 22, 2016

Uh oh!

MadcowD left a comment

Choose a reason for hiding this comment

Uh oh!

MadcowD Sep 22, 2016

Choose a reason for hiding this comment

Uh oh!

MadcowD commented Oct 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments