[Trainer] add –max_train_samples –max_val_samples –max_test_samples

As we were planning to add --max_train_samples --max_val_samples --max_test_samples to all examples #10423, I thought is there any reason why we don’t expand the Trainer to handle that?

It surely would be useful to be able to truncate the dataset at the point of Trainer to enable quick testing.

Another plus is that the metrics can then automatically include the actual number of samples run, rather than how it is done at the moment in examples.

That way this functionality would be built-in and examples will get it for free.

TODO:

  1. port --max_train_samples --max_val_samples --max_test_samples to Trainer and remove the then unneeded code in run_seq2seq.py
  2. extend metrics to report the number of samples as it’s done now in:
metrics[“train_samples”] = min(max_train_samples, len(train_dataset))

so that all scripts automatically get this metric reported. Most likely it should be done here:

def speed_metrics(split, start_time, num_samples=None):

@sgugger

1 possible answer(s) on “[Trainer] add –max_train_samples –max_val_samples –max_test_samples