GPT2Simple 运行时出现问题

问题描述

我正在尝试运行此 GPT2Simple 示例,但出现错误

Original stack trace for 'model/MatMul':
  File "c:/Users/Jerome Ariola/Desktop/Machine Learning Projects/gpt test.py",line 32,in <module>
    steps=1)
  File "C:\Program Files\python36\lib\site-packages\gpt_2_simple\gpt_2.py",line 198,in finetune
    output = model.model(hparams=hparams,X=context,gpus=gpus)
  File "C:\Program Files\python36\lib\site-packages\gpt_2_simple\src\model.py",line 212,in model
    logits = tf.matmul(h_flat,wte,transpose_b=True)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\util\dispatch.py",line 180,in wrapper
    return target(*args,**kwargs)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\ops\math_ops.py",line 2754,in matmul
    a,b,transpose_a=transpose_a,transpose_b=transpose_b,name=name)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py",line 6136,in mat_mul
    name=name)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\framework\op_def_library.py",line 794,in _apply_op_helper
    op_def=op_def)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\util\deprecation.py",line 507,in new_func
    return func(*args,**kwargs)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\framework\ops.py",line 3357,in create_op
    attrs,op_def,compute_device)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\framework\ops.py",line 3426,in _create_op_internal
    op_def=op_def)
  File "C:\Program Files\python36\lib\site-packages\tensorflow_core\python\framework\ops.py",line 1748,in __init__
    self._traceback = tf_stack.extract_stack()

这是代码,取自https://github.com/minimaxir/gpt-2-simple

我也从 Tensorflow 2.0 降级到 Tensorflow 1.15,因为 tf.contrib 或其他方面存在问题

# https://github.com/minimaxir/gpt-2-simple

import gpt_2_simple as gpt2
import os
import requests

model_name = "124M"
if not os.path.isdir(os.path.join("models",model_name)):
    print(f"Downloading {model_name} model...")
    gpt2.download_gpt2(model_name=model_name)   # model is saved into current directory under /models/124M/

file_name = "shakespeare.txt"

if not os.path.isfile(file_name):
    url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
    data = requests.get(url)
    
    with open(file_name,'w') as f:
        f.write(data.text)


sess = gpt2.start_tf_sess()
gpt2.finetune(sess,file_name,model_name=model_name,steps=1)

gpt2.generate(sess)

解决方法

更新:我再次降级,最初从 tf2.0 降级到 tf1.15,现在降级到 tf1.14。还是一样的错误。

这是我得到的完整错误(或至少分配器停止的地方)

Limit:                  6696213545
InUse:                  6693793536
MaxInUse:               6693795584
NumAllocs:                    2032
MaxAllocSize:            268435456

2021-03-19 01:21:53.793259: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ***x************************************************************************************************
2021-03-19 01:21:53.798596: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[1,12,1024,1024] and type bool on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py",line 1356,in _do_call
    return fn(*args)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py",line 1341,in _run_fn
    options,feed_dict,fetch_list,target_list,run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py",line 1429,in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/h11/mlp/Pow}}]]
Hint: If you want to see a list of allocated tensors when OOM happens,add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception,another exception occurred:

Traceback (most recent call last):
  File "c:/Users/Jerome Ariola/Desktop/Desktop 2021/Machine Learning Projects/Drake bot/gpt test.py",line 32,in <module>
    steps=1)
  File "C:\Program Files\Python36\lib\site-packages\gpt_2_simple\gpt_2.py",line 337,in finetune
    opt_compute,feed_dict={context: sample_batch()})
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py",line 950,in run
    run_metadata_ptr)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py",line 1173,in _run
    feed_dict_tensor,options,line 1350,in _do_run
    run_metadata)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\client\session.py",line 1370,in _do_call
    raise type(e)(node_def,op,message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[node model/h11/mlp/Pow (defined at C:\Program Files\Python36\lib\site-packages\gpt_2_simple\src\model.py:56) ]]
Hint: If you want to see a list of allocated tensors when OOM happens,add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Errors may have originated from an input operation.
Input Source operations connected to node model/h11/mlp/Pow:
 model/h11/mlp/c_fc/Reshape_2 (defined at C:\Program Files\Python36\lib\site-packages\gpt_2_simple\src\model.py:85)

Original stack trace for 'model/h11/mlp/Pow':
  File "c:/Users/Jerome Ariola/Desktop/Desktop 2021/Machine Learning Projects/Drake bot/gpt test.py",line 198,in finetune
    output = model.model(hparams=hparams,X=context,gpus=gpus)
  File "C:\Program Files\Python36\lib\site-packages\gpt_2_simple\src\model.py",line 197,in model
    h,present = block(h,'h%d' % layer,past=past,hparams=hparams)
  File "C:\Program Files\Python36\lib\site-packages\gpt_2_simple\src\model.py",line 158,in block
    m = mlp(norm(x,'ln_2'),'mlp',nx*4,line 148,in mlp
    h = gelu(conv1d(x,'c_fc',n_state))
  File "C:\Program Files\Python36\lib\site-packages\gpt_2_simple\src\model.py",line 56,in gelu
    return 0.5*x*(1+tf.tanh(np.sqrt(2/np.pi)*(x+0.044715*tf.pow(x,3))))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\util\dispatch.py",line 180,in wrapper
    return target(*args,**kwargs)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\math_ops.py",line 450,in pow
    return gen_math_ops._pow(x,y,name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py",line 7382,in _pow
    "Pow",x=x,y=y,name=name)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py",line 788,in _apply_op_helper
    op_def=op_def)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\util\deprecation.py",line 507,in new_func
    return func(*args,**kwargs)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py",line 3616,in create_op
    op_def=op_def)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py",line 2005,in __init__
    self._traceback = tf_stack.extract_stack()

PS C:\Users\Jerome Ariola\Desktop\Desktop 2021\Machine Learning Projects>