Post

Multi-agent Reinfocement Learning (AlphaStar)

Key Components

V-Trace

TD(\(\lambda\))

Architecture

\[\pi_{\theta}(a_t \vert s_t, z) = \mathbb{P}[a_t \vert s_t, z]\]

General-purpose Neural Network Components

Observation of Units

  • Self-attention mechanism
  • Spactial and non-spatial information –> Scatter connections
  • Partial observability –> Deep LSTM
  • Structured, combinatorial action space —> Auto-regressive policy and Recurrent Pointer Network
This post is licensed under CC BY 4.0 by the author.