Error message (from this job):
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1492, in wrapper
return func(*args, **kwargs)
File "distributed/test_c10d.py", line 507, in test_common_errors
next(gen)
AssertionError: ValueError not raised
According to the HUD, this is the timeline:
- c0adabe test started failing
- f595ba1 test stopped failing
- 8c798e0 test started failing again
- 1fe6a65 test switched from shard 2 to shard 1
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu
This looks like the ordering issue, because test passes if run individually in the same environment
I can confirm that the following environment variables were added before test is run: