Paddle Optimizing Network Performance for Distributed DNN Training on GPU Clusters

vfh0ocws  于 2021-11-29  发布在  Java
关注(0)|答案(0)|浏览(181)
  • AllReduce selectedrows

  • without csc

  • with csc

  • Optimizing Network Performance for Distributed DNN Training on GPU Clusters

  • Get the system arch and performance.

  • Analysis the operator time and communication time.

  • Mixed precision.

  • On Bert.

  • On Resnet 50 on imagenet dataset.

  • Dynamic(static) LA(lazy allreduce) overlap

  • FUse allreduce tensor and analysis the performance.

  • Implement the Hierarchical All-reduce.

  • CSC communication

  • resnet

  • bert

  • Pserver sync from step to var

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题