“Report Results” test intermittently fails for unsharded tests with “Duplicate test case…”

See https://app.circleci.com/pipelines/github/pytorch/pytorch/284234/workflows/a9fb7db0-33f6-4364-9ec0-16dfb35052c6/jobs/11485715 for example:

Mar 12 01:36:17     raise RuntimeWarning(f'Duplicate test case {test_case.name} in suite {suite_name} called from {self.name}')
Mar 12 01:36:17 RuntimeWarning: Duplicate test case test_AdaptiveMaxPool1d_indices_cpu_float32 in suite TestNNDeviceTypeCPU called from test_nn

This happens because test_nn was selected twice, first time into shard1 when AWS was unaccessible and 2nd time into shard 2, when AWS becomes accessible:

Mar 11 23:59:28 + test_python_shard1
Mar 11 23:59:28 + python test/run_test.py --exclude-jit-executor --shard 1 2 --verbose --determine-from=
Mar 11 23:59:30 Grabbing reports from nightly commit: 30b9583650c9d9ad4e1172c92bf57ec3f255061b
Mar 11 23:59:30 Selected tests: test_autograd, test_nn, distributed/rpc/test_process_group_agent, test_unary_ufuncs, test_jit, test_xnnpack_integration, distributed/rpc/test_faulty_agent, distributed/test_distributed_fork, test_cpp_extensions_jit, test_dataloader, test_tensor_creation_ops, test_type_hints, test_tensorboard, test_binary_ufuncs, test_sparse, test_multiprocessing, test_determination, test_foreach, test_utils, test_view_ops, distributed/test_c10d_spawn, test_cpp_api_parity, test_multiprocessing_spawn, test_openmp, test_vmap, test_mobile_optimizer, test_shape_ops, test_indexing, test_namedtuple_return_api, test_namedtensor, test_logging, test_jit_py3, benchmark_utils/test_benchmark_utils, distributed/test_nccl, test_futures, test_bundled_inputs, test_jit_disabled, test_function_schema, test_cpp_extensions_aot_ninja, distributed/test_jit_c10d, test_pytree, test_show_pickle, test_license, distributed/nn/jit/test_instantiator, test_public_bindings, test_vulkan, test_dataset, distributions/test_constraints, test_pruning_op, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker
...
Mar 12 00:47:02 + test_python_shard2
Mar 12 00:47:02 + python test/run_test.py --exclude-jit-executor --shard 2 2 --verbose --determine-from=
Mar 12 00:47:04 Selected tests: test_ops, test_nn, distributed/rpc/test_tensorpipe_agent, test_linalg, distributed/test_distributed_spawn, distributed/rpc/test_faulty_agent, distributed/test_distributed_fork, test_spectral_ops, test_mkldnn, test_tensor_creation_ops, test_functional_autograd_benchmark, test_tensorboard, test_optim, test_binary_ufuncs, distributions/test_distributions, test_determination, test_foreach, test_type_promotion, distributed/optim/test_zero_redundancy_optimizer, distributed/test_c10d_spawn, test_cpp_api_parity, test_sort_and_select, test_fx, test_fx_experimental, test_mobile_optimizer, test_shape_ops, test_indexing, test_namedtuple_return_api, test_op_aliases, test_logging, test_testing, benchmark_utils/test_benchmark_utils, test_futures, test_numpy_interop, test_complex, test_native_functions, test_numba_integration, test_function_schema, distributed/test_jit_c10d, test_pytree, test_show_pickle, test_license, distributed/nn/jit/test_instantiator, test_public_bindings, test_dataset, test_vulkan, distributions/test_constraints, test_pruning_op, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker
Mar 12 00:47:04 Running test_ops ... [2021-03-12 00:47:04.744114]
Mar 12 00:47:04 Executing ['/opt/conda/bin/python', 'test_ops.py', '-v'] ... [2021-03-12 00:47:04.744142]
Mar 12 00:47:04 Grabbing reports from nightly commit: 56e7889e526256d9c16bce6ac84bdb3ea206bd79

cc @ezyang @seemethere @malfet @walterddr @pytorch/pytorch-dev-infra

1 possible answer(s) on ““Report Results” test intermittently fails for unsharded tests with “Duplicate test case…”

  1. @janeyx99 this sounds like a good idea.
    How about the following idea:

    • Add --dump-shard-file option to run_test.py that will dump current sharding into a file (based on previous run stats)
    • run_test.py would use .pytorch_test_shard, if present and generated by the same version of run_test.py (version can be determined via commit or source file hashes). If not compute it using previous run stats
    • Modify CI system to compute shard file during build stage, ensuring that both test1 and test2 shards will get consistent input even if network is not available