Environment info
-
transformers
version: 4.3.3 -
Platform: Linux-4.15.0-29-generic-x86_64-with-glibc2.10
-
Python version: 3.8.8
-
PyTorch version (GPU?): 1.8.0 (False)
-
Tensorflow version (GPU?): not installed (NA)
Information
Model I am using Wav2vec2.0:
The problem arises when using:
Scipts:
import soundfile as sf
import torch
from transformers import AutoTokenizer, AutoModel,Wav2Vec2ForCTC, Wav2Vec2Tokenizer
tokenizer4 = AutoTokenizer.from_pretrained(“facebook/wav2vec2-large-xlsr-53”)
model4 = AutoModel.from_pretrained(“facebook/wav2vec2-large-xlsr-53”)
OSError:
OSError: Can’t load tokenizer for ‘facebook/wav2vec2-large-xlsr-53’. Make sure that:
-
‘facebook/wav2vec2-large-xlsr-53’ is a correct model identifier listed on ‘https://huggingface.co/models‘
-
or ‘facebook/wav2vec2-large-xlsr-53’ is the correct path to a directory containing relevant tokenizer files
The tasks I am working on is:
- an official wav2vec task: facebook/wav2vec2-large-xlsr-53
To reproduce
Steps to reproduce the behavior:
Follow the instructions
https://huggingface.co/facebook/wav2vec2-large-xlsr-53
Expected behavior
I try to use xlsr model as the pre-trained model to finetune my own ASR model, but the xlsr model, especially tokenizer, can’t be loaded smoothly. Could you tell me how to modify it? Thank you very much!
The model doesn’t contains the tokenizer and preprocessing files.
Checkout this notebook:
https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
To build your own vocab etc