如何使用TensorRT和PyCUDA仅测量GPU中的推理时间?

问题描述

我只想测量Jetson TX2中的推理时间。我该如何改善我的功能呢?现在,我正在测量:

  • 图像从CPU到GPU的传输

  • 结果从GPU传输到CPU

  • 推断

还是由于GPU的工作方式而无法实现?我的意思是,如果我将功能分为3部分,我将不得不使用stream.synchronize()多少次:

  1. 从CPU转移到GPU
  2. 推断
  3. 从GPU传输到CPU

谢谢

INFERENCE.PY代码

def do_inference(engine,pics_1,h_input,d_input,h_output,d_output,stream,batch_size):

    """
    This is the function to run the inference
    Args:
      engine : Path to the TensorRT engine. 
      pics_1 : Input images to the model.  
      h_input: Input in the host (CPU). 
      d_input: Input in the device (GPU). 
      h_output: Output in the host (CPU). 
      d_output: Output in the device (GPU). 
      stream: CUDA stream.
      batch_size : Batch size for execution time.
      height: Height of the output image.
      width: Width of the output image.
    
    Output:
      The list of output images.

    """
      
    # Context for executing inference using ICudaEngine
    with engine.create_execution_context() as context:
        
        # Transfer input data from CPU to GPU.
        cuda.memcpy_htod_async(d_input,stream)

        # Run inference.
        #context.profiler = trt.Profiler() ##shows execution time(ms) of each layer
        context.execute(batch_size=1,bindings=[int(d_input),int(d_output)])

        # Transfer predictions back from the GPU to the CPU.
        cuda.memcpy_dtoh_async(h_output,stream)
        
        # Synchronize the stream.
        stream.synchronize()
        
        # Return the host output.
        out = h_output       
        return out

在TIMER.PY中编码

for i in range (count):
    start = time.perf_counter()
    # Classification - calling TX2_classify.py
    out = eng.do_inference(engine,image,1) 
    inference_time = time.perf_counter() - start
    print("TIME")
    print(inference_time * 1000)
    print("\n")
    pred = postprocess_inception(out)
    print(pred)
    print("\n")

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...