问题描述
我正在使用 tensorflow 的通用句子编码器 (https://tfhub.dev/google/universal-sentence-encoder/4) 训练模型来计算文本之间的相似性。当我减少large_data
目录中的文本文件数量时,程序正常运行。但是,现在那里有很多文本文件并且程序崩溃了。这是我的代码:
import pandas as pd
import tensorflow_hub as hub
import numpy as np
module_url = 'https://tfhub.dev/google/universal-sentence-encoder/4' # Pre-trained model URL
model = hub.load(module_url)
documents = [open('large_data/' + f + '.txt').read() for f in text_files]
message_embeddings = model(documents)
corr = np.inner(message_embeddings,message_embeddings)
df = pd.DataFrame(data=corr,index=documents,columns=documents)
df.to_csv('matrix.csv')
在 large_data 文件夹中,有大约 20K 个文本文件(它们很大,每个大约 5K 字)。
这里有一些日志:
2021-05-21 13:11:51.172674: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices,tf_xla_enable_xla_devices not set
2021-05-21 13:11:51.172973: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (onednN) to use the following cpu instructions in performance-critical operations: SSE4.1 SSE4.2 AVX
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2021-05-21 13:11:51.175407: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-05-21 13:11:54.710993: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-21 13:11:55.186770: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] cpu Frequency: 2593520000 Hz
sys:1: DtypeWarning: Columns (105,106) have mixed types.Specify dtype option on import or set low_memory=False.
2021-05-21 13:14:41.195210: W tensorflow/core/common_runtime/bfc_allocator.cc:433] Allocator (mklcpu) ran out of memory trying to allocate 16.76GiB (rounded to 17995920128)requested by op StatefulPartitionedCall/StatefulPartitionedCall/text_preprocessor/add_bigrams/concat
Current allocation summary follows.
2021-05-21 13:14:41.195296: I tensorflow/core/common_runtime/bfc_allocator.cc:972] BFCAllocator dump for mklcpu
2021-05-21 13:14:41.195316: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (256): Total Chunks: 0,Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-05-21 13:14:41.195330: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (512): Total Chunks: 0,Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-05-21 13:14:41.195380: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (1024): Total Chunks: 0,Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-05-21 13:14:41.195401: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (2048): Total Chunks: 0,Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
.....
2021-05-21 13:14:41.195846: I tensorflow/core/common_runtime/bfc_allocator.cc:1001] Size: 893.18MiB | Requested Size: 4.2KiB | in_use: 0 | bin_num: 20,prev: Size: 32.55MiB | Requested Size: 32.55MiB | in_use: 1 | bin_num: -1
2021-05-21 13:14:41.195868: I tensorflow/core/common_runtime/bfc_allocator.cc:1001] Size: 2.00GiB | Requested Size: 1.60GiB | in_use: 0 | bin_num: 20
2021-05-21 13:14:41.195891: I tensorflow/core/common_runtime/bfc_allocator.cc:1001] Size: 2.08GiB | Requested Size: 0B | in_use: 0 | bin_num: 20,prev: Size: 16.76GiB | Requested Size: 16.76GiB | in_use: 1 | bin_num: -1
2021-05-21 13:14:41.195913: I tensorflow/core/common_runtime/bfc_allocator.cc:1001] Size: 4.00GiB | Requested Size: 2.41GiB | in_use: 0 | bin_num: 20
2021-05-21 13:14:41.195937: I tensorflow/core/common_runtime/bfc_allocator.cc:1001] Size: 15.24GiB | Requested Size: 0B | in_use: 0 | bin_num: 20,prev: Size: 16.76GiB | Requested Size: 16.76GiB | in_use: 1 | bin_num: -1
最后,错误:
2021-05-21 13:14:41.200222: W tensorflow/core/common_runtime/bfc_allocator.cc:441] *****************************___*****************************___________________________________*_**
2021-05-21 13:14:41.200275: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES Failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[10000,74983] and type string on /job:localhost/replica:0/task:0/device:cpu:0 by allocator mklcpu
Traceback (most recent call last):
File "/root/petar/main.py",line 15,in <module>
message_embeddings = model(documents)
File "/root/miniconda3/envs/pipeline/lib/python3.9/site-packages/tensorflow/python/saved_model/load.py",line 668,in _call_attribute
return instance.__call__(*args,**kwargs)
File "/root/miniconda3/envs/pipeline/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py",line 828,in __call__
result = self._call(*args,**kwds)
File "/root/miniconda3/envs/pipeline/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py",line 894,in _call
return self._concrete_stateful_fn._call_flat(
File "/root/miniconda3/envs/pipeline/lib/python3.9/site-packages/tensorflow/python/eager/function.py",line 1918,in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/root/miniconda3/envs/pipeline/lib/python3.9/site-packages/tensorflow/python/eager/function.py",line 555,in call
outputs = execute.execute(
File "/root/miniconda3/envs/pipeline/lib/python3.9/site-packages/tensorflow/python/eager/execute.py",line 59,in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle,device_name,op_name,tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[10000,74983] and type string on /job:localhost/replica:0/task:0/device:cpu:0 by allocator mklcpu
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/text_preprocessor/add_bigrams/concat}}]]
Hint: If you want to see a list of allocated tensors when OOM happens,add report_tensor_allocations_upon_oom to Runoptions for current allocation info.
[Op:__inference_restored_function_body_5285]
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)