As we were planning to add --max_train_samples --max_val_samples --max_test_samples
to all examples #10423, I thought is there any reason why we don’t expand the Trainer to handle that?
It surely would be useful to be able to truncate the dataset at the point of Trainer to enable quick testing.
Another plus is that the metrics can then automatically include the actual number of samples run, rather than how it is done at the moment in examples.
That way this functionality would be built-in and examples will get it for free.
TODO:
- port
--max_train_samples --max_val_samples --max_test_samples
to Trainer and remove the then unneeded code inrun_seq2seq.py
- extend metrics to report the number of samples as it’s done now in:
transformers/examples/seq2seq/run_seq2seq.py
Line 590
in
aca6288
so that all scripts automatically get this metric reported. Most likely it should be done here:
Yes, I think it’s the best solution.