Environment info
transformers
version: 4.4.0.dev0- Platform: Windows
- Python version : 3.7.10
- Using GPU in script?: Issue is with both
- Using distributed or parallel set-up in script?: Single device
@patrickvonplaten (because the issue is with longformers)
@LysandreJik (because the issue is with tokenizers)
Information
Model I am using : Longformer.
The problem arises when loading tokenizer using from_pretrained() function.
The tasks I am working on is Question Answering but it does not matter since I am facing this issue while loading any kind of Longformer:
To reproduce
Steps to reproduce the behavior:
- Install Transformers
- import Transformers
- run tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_NAME)
Reference Code:
!pip3 install git+https://github.com/huggingface/transformers
import transformers
DEEP_LEARNING_MODEL_NAME = "mrm8488/longformer-base-4096-finetuned-squadv2" # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "a-ware/longformer-QA" # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "valhalla/longformer-base-4096-finetuned-squadv1" # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "allenai/longformer-base-4096" # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "deepset/roberta-base-squad2" # Working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "mrm8488/bert-base-portuguese-cased-finetuned-squad-v1-pt" # Working for 4.4.0.dev0
tokenizer = transformers.AutoTokenizer.from_pretrained(DEEP_LEARNING_MODEL_NAME)
Reference Colab notebook:
https://colab.research.google.com/drive/1v10E77og3-7B2_aFfYhrHvzBZzRlo7wo#scrollTo=2zHj2lMsFuv3
Further Information:
- This issue started appearing today. It was working fine till yesterday.
- This issue is only with 4.4.0 dev version. This issue does not occur for pip install transformers (which is currently on version 4.3.3)
- The issue is only while loading tokenizers, not models
- The issue is only while loading longformers (any longformer model). Other models’ tokenizers are loaded correctly (for example ‘deepset/roberta-base-squad2’ tokenizer can be loaded correctly)
It should now be fixed on
master
. Thanks a lot for using themaster
branch and letting us know of the issue!