This is part of the ongoing effort to sync the example scripts.
In #10437 (comment) it was flagged that some scripts have test/predict, whereas others don’t.
Should we:
A. have all scripts have train/eval/predict
B. only have predict where it’s desired
I think we should have it on all scripts except the language-modeling ones -> it doesn’t make much sense there.
Sure @stas00,
I will take this. Actually, I would love to contribute more. I really enjoy contributing to this community.
Yes, that would be needed.
Like all other examples, the script is given as just that, an example. As said in the main README under “Why shouldn’t I use Transformers?”:
So like all other example scripts, the
run_qa
script will support any dataset that is structured the same way as the original dataset that was used with it (squad) but if the user wants the script to work on another dataset structured differently they will need to tweak it to their needs.Good job getting to the root of the issue, I hadn’t thought of that when you added the
max_sample_xxx
but this task uses a subclass of the mainTrainer
that does require the originaleval_examples
.The fix you propose appears good to me and you should definitely make a PR with it as soon as you can 🙂
For the predicting stage, note that the subclass of the
Trainer
will require a test_dataset and test_examples to work (basically to interpret the predictions of the model as spans of the original texts, theTrainer
needs the original texts). I do think adding a--do_predict
torun_qa
is going to be a bit complex so should be treated separately so my advise would be to:max_val_samples
is passedLet me know if that makes sense to you and if you need any help along the way (or don’t want to do one of those steps yourself).