Unable To Load Pretrained Longformer Models’ Tokenizers

Environment info

  • transformers version: 4.4.0.dev0
  • Platform: Windows
  • Python version : 3.7.10
  • Using GPU in script?: Issue is with both
  • Using distributed or parallel set-up in script?: Single device

@patrickvonplaten (because the issue is with longformers)
@LysandreJik (because the issue is with tokenizers)

Information

Model I am using : Longformer.
The problem arises when loading tokenizer using from_pretrained() function.
The tasks I am working on is Question Answering but it does not matter since I am facing this issue while loading any kind of Longformer:

To reproduce

Steps to reproduce the behavior:

  1. Install Transformers
  2. import Transformers
  3. run tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_NAME)

Reference Code:

!pip3 install git+https://github.com/huggingface/transformers
import transformers
DEEP_LEARNING_MODEL_NAME = "mrm8488/longformer-base-4096-finetuned-squadv2"               # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "a-ware/longformer-QA"                                       # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "valhalla/longformer-base-4096-finetuned-squadv1"            # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "allenai/longformer-base-4096"                               # Not working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "deepset/roberta-base-squad2"                                # Working for 4.4.0.dev0
# DEEP_LEARNING_MODEL_NAME = "mrm8488/bert-base-portuguese-cased-finetuned-squad-v1-pt"   # Working for 4.4.0.dev0

tokenizer = transformers.AutoTokenizer.from_pretrained(DEEP_LEARNING_MODEL_NAME)

Reference Colab notebook:

https://colab.research.google.com/drive/1v10E77og3-7B2_aFfYhrHvzBZzRlo7wo#scrollTo=2zHj2lMsFuv3

Further Information:

  • This issue started appearing today. It was working fine till yesterday.
  • This issue is only with 4.4.0 dev version. This issue does not occur for pip install transformers (which is currently on version 4.3.3)
  • The issue is only while loading tokenizers, not models
  • The issue is only while loading longformers (any longformer model). Other models’ tokenizers are loaded correctly (for example ‘deepset/roberta-base-squad2’ tokenizer can be loaded correctly)

1 possible answer(s) on “Unable To Load Pretrained Longformer Models’ Tokenizers

  1. It should now be fixed on master. Thanks a lot for using the master branch and letting us know of the issue!