🐛 Bug
The documentation at https://pytorch.org/docs/stable/distributed.html specifies that BAND, BOR, BXOR are supported reduction operators, however, they do not work with all_reduce
using the NCCL backend.
We can see in the code that there is no mapping for the bitwise operators, and we use this mapping to get the nccl operation to run. What happens when the mapping is not specified is that the map attempts to default construct a ncclRedOp_t
type (https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html#ncclredop-t) and ends up incorrectly mapping these reduction types to ncclSum
. This will mean that if we use these bitwise reduction ops we will just end up doing a sum.
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski
Sounds good, @thinking-tower I think the first step for now can just be to throw a descriptive error if the operation is not supported (not in the
ncclOp
map) and probably update our distributed documentation indicating that BAND, BOR, and BXOR is not supported with the NCCL backend currently.In the longer term, we can discuss with NCCL and see if there is demand for these ops from users.