问题描述
我只想测量Jetson TX2中的推理时间。我该如何改善我的功能呢?现在,我正在测量:
还是由于GPU的工作方式而无法实现?我的意思是,如果我将功能分为3部分,我将不得不使用stream.synchronize()
多少次:
谢谢
INFERENCE.PY代码
def do_inference(engine,pics_1,h_input,d_input,h_output,d_output,stream,batch_size):
"""
This is the function to run the inference
Args:
engine : Path to the TensorRT engine.
pics_1 : Input images to the model.
h_input: Input in the host (cpu).
d_input: Input in the device (GPU).
h_output: Output in the host (cpu).
d_output: Output in the device (GPU).
stream: CUDA stream.
batch_size : Batch size for execution time.
height: Height of the output image.
width: Width of the output image.
Output:
The list of output images.
"""
# Context for executing inference using ICudaEngine
with engine.create_execution_context() as context:
# Transfer input data from cpu to GPU.
cuda.memcpy_htod_async(d_input,stream)
# Run inference.
#context.profiler = trt.Profiler() ##shows execution time(ms) of each layer
context.execute(batch_size=1,bindings=[int(d_input),int(d_output)])
# Transfer predictions back from the GPU to the cpu.
cuda.memcpy_dtoh_async(h_output,stream)
# Synchronize the stream.
stream.synchronize()
# Return the host output.
out = h_output
return out
在TIMER.PY中编码
for i in range (count):
start = time.perf_counter()
# Classification - calling TX2_classify.py
out = eng.do_inference(engine,image,1)
inference_time = time.perf_counter() - start
print("TIME")
print(inference_time * 1000)
print("\n")
pred = postprocess_inception(out)
print(pred)
print("\n")
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)