What results does the paper report for the proposed algorithm?

The paper reports that, using the same learning algorithm, network architecture, and hyper-parameters, the method robustly solves more than 20 simulated physics tasks and achieves performance competitive with a planning algorithm that has full access to the dynamics of the domain and its derivatives.[S1] The paper further demonstrates that for many tasks the algorithm can learn policies end-to-end directly from raw pixel inputs.[S1]

Continuous control with deep reinforcement learning (1509.02971) |...

This paper adapts ideas underlying the success of Deep Q-Learning to the continuous action domain and presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. The paper reports that, with the same learning algorithm, network architecture, and hyper-parameters, the method robustly solves more than 20 simulated physics tasks and can learn some tasks end-to-end from raw pixel inputs.

What this paper is about

The paper adapts ideas underlying the success of Deep Q-Learning to the continuous action domain.[S1] The paper presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces.[S1] The paper reports using the same learning algorithm, network architecture, and hyper-parameters across tasks.[S1] The paper reports that the algorithm robustly solves more than 20 simulated physics tasks.[S1] The paper lists example tasks that include cartpole swing-up, dexterous manipulation, legged locomotion, and car driving.[S1] The paper reports that the algorithm can find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives.[S1] The paper further demonstrates that for many tasks the algorithm can learn policies end-to-end directly from raw pixel inputs.[S1]

Core claims to remember

The paper states that it adapts ideas underlying the success of Deep Q-Learning to handle continuous action domains.[S1] The paper states that the presented method is an actor-critic algorithm.[S1] The paper states that the presented method is model-free.[S1] The paper states that the algorithm is based on the deterministic policy gradient.[S1] The paper states that the algorithm can operate over continuous action spaces.[S1]

The paper reports that the same learning algorithm, network architecture, and hyper-parameters are used to solve a range of tasks.[S1] The paper reports that this approach robustly solves more than 20 simulated physics tasks.[S1] The paper names classic problems among these tasks, including cartpole swing-up.[S1] The paper names dexterous manipulation as one of the simulated physics tasks it solves.[S1] The paper names legged locomotion as one of the simulated physics tasks it solves.[S1] The paper names car driving as one of the simulated physics tasks it solves.[S1]

The paper reports that the algorithm is able to find policies whose performance is competitive with those found by a planning algorithm.[S1] The paper specifies that the planning algorithm used for comparison has full access to the dynamics of the domain and its derivatives.[S1] The paper further demonstrates that for many tasks the algorithm can learn policies end-to-end.[S1] The paper specifies that this end-to-end learning is done directly from raw pixel inputs.[S1]

Limitations and caveats

The paper reports results on more than 20 simulated physics tasks, including cartpole swing-up, dexterous manipulation, legged locomotion, and car driving.[S1] The paper reports that the algorithm’s policy performance is compared against a planning algorithm that has full access to the dynamics of the domain and its derivatives.[S1] The paper reports that for many tasks the algorithm can learn policies end-to-end directly from raw pixel inputs.[S1]

How to apply this in study or projects

Read the part of the paper that adapts ideas underlying the success of Deep Q-Learning to the continuous action domain, and write down what changes are introduced to move from discrete actions to continuous action spaces.[S1] Identify the components of the actor-critic, model-free algorithm and connect each component to the statement that the method is based on the deterministic policy gradient and operates over continuous action spaces.[S1]

Track how the paper keeps the same learning algorithm, network architecture, and hyper-parameters across tasks, and list the tasks the paper names as examples of the simulated physics suite.[S1] Extract the paper’s reported task set size and the claim that the method robustly solves more than 20 simulated physics tasks, and record the specific classic problems mentioned.[S1]

Write a short comparison note that restates the paper’s evaluation claim about competitiveness with a planning algorithm, including the detail that the planning algorithm has full access to the dynamics of the domain and its derivatives.[S1] Make a separate note for the paper’s end-to-end learning claim, including the detail that policies are learned directly from raw pixel inputs for many tasks.[S1]

Continuous control with deep reinforcement learning (arXiv:1509.02971) — Paper brief

What this paper is about

Core claims to remember

Limitations and caveats

How to apply this in study or projects

Sources

FAQ

What method does arXiv:1509.02971 present for continuous control?

What results does the paper report for the proposed algorithm?

Related reads