torch.distributed NCCL backend does not support bitwise reduction ops

🐛 Bug

The documentation at https://pytorch.org/docs/stable/distributed.html specifies that BAND, BOR, BXOR are supported reduction operators, however, they do not work with all_reduce using the NCCL backend.

We can see in the code that there is no mapping for the bitwise operators, and we use this mapping to get the nccl operation to run. What happens when the mapping is not specified is that the map attempts to default construct a ncclRedOp_t type (https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html#ncclredop-t) and ends up incorrectly mapping these reduction types to ncclSum. This will mean that if we use these bitwise reduction ops we will just end up doing a sum.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski

1 possible answer(s) on “torch.distributed NCCL backend does not support bitwise reduction ops

  1. Sounds good, @thinking-tower I think the first step for now can just be to throw a descriptive error if the operation is not supported (not in the ncclOp map) and probably update our distributed documentation indicating that BAND, BOR, and BXOR is not supported with the NCCL backend currently.

    In the longer term, we can discuss with NCCL and see if there is demand for these ops from users.