[examples] should all examples support the predict stage?

This is part of the ongoing effort to sync the example scripts.

In #10437 (comment) it was flagged that some scripts have test/predict, whereas others don’t.

Should we:
A. have all scripts have train/eval/predict
B. only have predict where it’s desired

@sgugger, @patil-suraj, @LysandreJik

5 thoughts on “[examples] should all examples support the predict stage?

  1. I think we should have it on all scripts except the language-modeling ones -> it doesn’t make much sense there.

  2. Like all other examples, the script is given as just that, an example. As said in the main README under “Why shouldn’t I use Transformers?”:

    While we strive to present as many use cases as possible, the scripts in our examples folder are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
    

    So like all other example scripts, the run_qa script will support any dataset that is structured the same way as the original dataset that was used with it (squad) but if the user wants the script to work on another dataset structured differently they will need to tweak it to their needs.

  3. Good job getting to the root of the issue, I hadn’t thought of that when you added the max_sample_xxx but this task uses a subclass of the main Trainer that does require the original eval_examples.

    The fix you propose appears good to me and you should definitely make a PR with it as soon as you can 🙂

    For the predicting stage, note that the subclass of the Trainer will require a test_dataset and test_examples to work (basically to interpret the predictions of the model as spans of the original texts, the Trainer needs the original texts). I do think adding a --do_predict to run_qa is going to be a bit complex so should be treated separately so my advise would be to:

    1. make a PR with the fix for evaluation in run_qa/run_qa_beam_search when max_val_samples is passed
    2. make a PR to add predict in all but run_qa/run_qa_beam_search scripts (when it makes sense of course)
    3. make a PR to add predict in run_qa/run_qa_beam_search

    Let me know if that makes sense to you and if you need any help along the way (or don’t want to do one of those steps yourself).