torch.nn.parallel.DistributedDataParallel() claims to be better than
DistributedDataParallel looks much more complicated.
We will evaluate this claim, explain what the wrapper does, and how the wrapper is implemented. I assume you know PyTorch’s dynamic computational graph as well as Python GIL. And PyTorch version is v1.0.1.
Because of multi-process nature of the package, we need to initialize the package.
DistributedDataParallel can be seen as a multi-node enhancement on top of the
DataParallel. Indeed, its implementation reuses the same methods in