Multi-agent Reinfocement Learning (AlphaStar)
Key Components
V-Trace
TD(\(\lambda\))
Architecture
\[\pi_{\theta}(a_t \vert s_t, z) = \mathbb{P}[a_t \vert s_t, z]\]General-purpose Neural Network Components
Observation of Units
- Self-attention mechanism
- Spactial and non-spatial information –> Scatter connections
- Partial observability –> Deep LSTM
- Structured, combinatorial action space —> Auto-regressive policy and Recurrent Pointer Network
This post is licensed under CC BY 4.0 by the author.