Paper Critique 2

Ali Farazdaghi

ELEC873

NV-Group: Link-Efficient Reduction for Distributed Deep Learning on Modern Dense GPU Systems

Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni and Dhabaleswar K. (DK) Panda

ICS '20, Barcelona, Spain

Allgather
Allreduce

The Goal

An Allreduce algorithm which is
  • Link-efficient
  • Utilizes all links (NVlinks, IB, ...)

Hardware/Software Features

  • Direct LOAD/STORE
  • NVLinks between CPUs and GPUs
  • Overlapping Reduction with Communication
Nvidia DGX-2
Summit
Link Utilization (on Summit)
Link Utilization + Proposed (on Summit)
On DGX-2

How?!

2 Phases

  1. Reduce/Scatter (within group/with other nodes)
  2. Allgather (Ring-like)

Strengths

  • Beats all state-of-the-art methods!
  • Discusses the shortcomings of other methods
  • Has the mathmatics to back it up

Weaknesses

  • Packs a lot into 10 pages
  • Some symbols are not explicitly defined
    LOAD-REDUCE-STORE-TO-*

Q/A

Refs are hyperlinked

Link to presentation repo