Paper Critique 2
Ali Farazdaghi
ELEC873
Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni and Dhabaleswar K. (DK) Panda
ICS '20, Barcelona, Spain
Allgather
Allreduce
The Goal
An Allreduce algorithm which is
- Link-efficient
- Utilizes all links (NVlinks, IB, ...)
Hardware/Software Features
- Direct LOAD/STORE
- NVLinks between CPUs and GPUs
- Overlapping Reduction with Communication
Nvidia DGX-2
Summit
Link Utilization + Proposed (on Summit)
On DGX-2
2 Phases
- Reduce/Scatter (within group/with other nodes)
- Allgather (Ring-like)
Strengths
- Beats all state-of-the-art methods!
- Discusses the shortcomings of other methods
- Has the mathmatics to back it up
Weaknesses
- Packs a lot into 10 pages
- Some symbols are not explicitly defined
LOAD-REDUCE-STORE-TO-*