问题描述
我正在使用Azure ML studio来训练带有squad数据集的回答ALBERT模型的问题。我收到以下错误。这是我执行的代码。
# Clone transformers github repo
!git clone https://github.com/huggingface/transformers \
&& cd transformers \
&& git checkout a3085020ed0d81d4903c50967687192e3101e770
# Install libraries
# !pip install ./transformers
!pip install transformers
!pip install tensorboardX
# Get data
! mkdir dataset \
&& cd dataset \
&& wget https://rajpurkar.github.io/squad-explorer/dataset/train-v2.0.json \
&& wget https://rajpurkar.github.io/squad-explorer/dataset/dev-v2.0.json
# Train model
!export squad_DIR=/content/dataset \
&& python transformers/examples/run_squad.py \
--model_type albert \
--model_name_or_path albert-base-v2 \
--do_train \
--do_eval \
--do_lower_case \
--train_file $squad_DIR/train-v2.0.json \
--predict_file $squad_DIR/dev-v2.0.json \
--per_gpu_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 1.0 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /content/model_output \
--save_steps 1000 \
--threads 4 \
--version_2_with_negative
我正在使用NVIDIA Tesla K80 GPU。当我执行上面的单元格来训练模型时,出现以下错误:
2020-10-31 01:31:45.732913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-10-31 01:31:45.733023: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-10-31 01:31:45.733043: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT,please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "transformers/examples/run_squad.py",line 32,in <module>
from transformers import (
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/transformers/__init__.py",line 135,in <module>
from .pipelines import (
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/transformers/pipelines.py",line 47,in <module>
from .modeling_tf_auto import (
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/transformers/modeling_tf_auto.py",line 45,in <module>
from .modeling_tf_albert import (
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/transformers/modeling_tf_albert.py",line 24,in <module>
from .activations_tf import get_tf_activation
File "/anaconda/envs/azureml_py36/lib/python3.6/site-packages/transformers/activations_tf.py",line 53,in <module>
"swish": tf.keras.activations.swish,AttributeError: module 'tensorflow_core.python.keras.api._v2.keras.activations' has no attribute 'swish'
解决方法
已解决
需要添加:
!pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU