TensorRT——YOLOv8对象检测部署

来源：易妖游戏网

1.安装TensorRT

2.TensorRT使用和模型转换

trtexec.exe --h 查询TensorRT相关指令
将yolov8训练好的.pt模型转换为TensorRT格式
1.将.pt模型转换为.onnx

# yolov8源码文件夹下将pt模型转换为onnx格式
yolo mode=export model=drone_best.pt format=onnx opset=11

trtexec.exe --onnx=drone_best.onnx --saveEngine=drone_best_16.engine --fp16

PS D:\TensorRT-8.4.0.6.Windows10.x86_.cuda-11.6.cudnn8.3\TensorRT-8.4.0.6\bin> trtexec.exe --onnx=drone_best.onnx --saveEngine=drone_best_16.engine --fp16

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\trtexec.exe --onnx=drone_best.onnx --saveEngine=drone_best_16.engine --fp16
[11/19/2024-09:28:35] [I] === Model Options ===
[11/19/2024-09:28:35] [I] Format: ONNX
[11/19/2024-09:28:35] [I] Model: drone_best.onnx
[11/19/2024-09:28:35] [I] Output:
[11/19/2024-09:28:35] [I] === Build Options ===
[11/19/2024-09:28:35] [I] Max batch: explicit batch
[11/19/2024-09:28:35] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/19/2024-09:28:35] [I] minTiming: 1
[11/19/2024-09:28:35] [I] avgTiming: 8
[11/19/2024-09:28:35] [I] Precision: FP32+FP16
[11/19/2024-09:28:35] [I] LayerPrecisions:
[11/19/2024-09:28:35] [I] Calibration:
[11/19/2024-09:28:35] [I] Refit: Disabled
[11/19/2024-09:28:35] [I] Sparsity: Disabled
[11/19/2024-09:28:35] [I] Safe mode: Disabled
[11/19/2024-09:28:35] [I] DirectIO mode: Disabled
[11/19/2024-09:28:35] [I] Restricted mode: Disabled
[11/19/2024-09:28:35] [I] Save engine: drone_best_16.engine
[11/19/2024-09:28:35] [I] Load engine:
[11/19/2024-09:28:35] [I] Profiling verbosity: 0
[11/19/2024-09:28:35] [I] Tactic sources: Using default tactic sources
[11/19/2024-09:28:35] [I] timingCacheMode: local
[11/19/2024-09:28:35] [I] timingCacheFile:
[11/19/2024-09:28:35] [I] Input(s)s format: fp32:CHW
[11/19/2024-09:28:35] [I] Output(s)s format: fp32:CHW
[11/19/2024-09:28:35] [I] Input build shapes: model
[11/19/2024-09:28:35] [I] Input calibration shapes: model
[11/19/2024-09:28:35] [I] === System Options ===
[11/19/2024-09:28:35] [I] Device: 0
[11/19/2024-09:28:35] [I] DLACore:
[11/19/2024-09:28:35] [I] Plugins:
[11/19/2024-09:28:35] [I] === Inference Options ===
[11/19/2024-09:28:35] [I] Batch: Explicit
[11/19/2024-09:28:35] [I] Input inference shapes: model
[11/19/2024-09:28:35] [I] Iterations: 10
[11/19/2024-09:28:35] [I] Duration: 3s (+ 200ms warm up)
[11/19/2024-09:28:35] [I] Sleep time: 0ms
[11/19/2024-09:28:35] [I] Idle time: 0ms
[11/19/2024-09:28:35] [I] Streams: 1
[11/19/2024-09:28:35] [I] ExposeDMA: Disabled
[11/19/2024-09:28:35] [I] Data transfers: Enabled
[11/19/2024-09:28:35] [I] Spin-wait: Disabled
[11/19/2024-09:28:35] [I] Multithreading: Disabled
[11/19/2024-09:28:35] [I] CUDA Graph: Disabled
[11/19/2024-09:28:35] [I] Separate profiling: Disabled
[11/19/2024-09:28:35] [I] Time Deserialize: Disabled
[11/19/2024-09:28:35] [I] Time Refit: Disabled
[11/19/2024-09:28:35] [I] Skip inference: Disabled
[11/19/2024-09:28:35] [I] Inputs:
[11/19/2024-09:28:35] [I] === Reporting Options ===
[11/19/2024-09:28:35] [I] Verbose: Disabled
[11/19/2024-09:28:35] [I] Averages: 10 inferences
[11/19/2024-09:28:35] [I] Percentile: 99
[11/19/2024-09:28:35] [I] Dump refittable layers:Disabled
[11/19/2024-09:28:35] [I] Dump output: Disabled
[11/19/2024-09:28:35] [I] Profile: Disabled
[11/19/2024-09:28:35] [I] Export timing to JSON file:
[11/19/2024-09:28:35] [I] Export output to JSON file:
[11/19/2024-09:28:35] [I] Export profile to JSON file:
[11/19/2024-09:28:35] [I]
[11/19/2024-09:28:35] [I] === Device Information ===
[11/19/2024-09:28:35] [I] Selected Device: NVIDIA GeForce RTX 3060 Laptop GPU
[11/19/2024-09:28:35] [I] Compute Capability: 8.6
[11/19/2024-09:28:35] [I] SMs: 30
[11/19/2024-09:28:35] [I] Compute Clock Rate: 1.702 GHz
[11/19/2024-09:28:35] [I] Device Global Memory: 6143 MiB
[11/19/2024-09:28:35] [I] Shared Memory per SM: 100 KiB
[11/19/2024-09:28:35] [I] Memory Bus Width: 192 bits (ECC disabled)
[11/19/2024-09:28:35] [I] Memory Clock Rate: 7.001 GHz
[11/19/2024-09:28:35] [I]
[11/19/2024-09:28:35] [I] TensorRT version: 8.4.0
[11/19/2024-09:28:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +510, GPU +0, now: CPU 21076, GPU 1175 (MiB)
[11/19/2024-09:28:36] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 21297 MiB, GPU 1175 MiB
[11/19/2024-09:28:37] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 21722 MiB, GPU 1297 MiB
[11/19/2024-09:28:37] [I] Start parsing network model
[11/19/2024-09:28:38] [I] [TRT] ----------------------------------------------------------------
[11/19/2024-09:28:38] [I] [TRT] Input filename:   drone_best.onnx
[11/19/2024-09:28:38] [I] [TRT] ONNX IR version:  0.0.6
[11/19/2024-09:28:38] [I] [TRT] Opset version:    11
[11/19/2024-09:28:38] [I] [TRT] Producer name:    pytorch
[11/19/2024-09:28:38] [I] [TRT] Producer version: 1.12.1
[11/19/2024-09:28:38] [I] [TRT] Domain:
[11/19/2024-09:28:38] [I] [TRT] Model version:    0
[11/19/2024-09:28:38] [I] [TRT] Doc string:
[11/19/2024-09:28:38] [I] [TRT] ----------------------------------------------------------------
[11/19/2024-09:28:38] [W] [TRT] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT weights, while TensorRT does not natively support INT. Attempting to cast down to INT32.
[11/19/2024-09:28:38] [I] Finish parsing network model
[11/19/2024-09:29:01] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.5.1
[11/19/2024-09:29:01] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +650, GPU +274, now: CPU 22239, GPU 1571 (MiB)
[11/19/2024-09:29:21] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +507, GPU +192, now: CPU 22746, GPU 1763 (MiB)
[11/19/2024-09:29:21] [W] [TRT] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.2.1
[11/19/2024-09:29:21] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/19/2024-09:38:36] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[11/19/2024-09:38:36] [I] [TRT] Total Host Persistent Memory: 162016
[11/19/2024-09:38:36] [I] [TRT] Total Device Persistent Memory: 1293312
[11/19/2024-09:38:36] [I] [TRT] Total Scratch Memory: 0
[11/19/2024-09:38:36] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 28 MiB, GPU 4375 MiB
[11/19/2024-09:38:36] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 16.7105ms to assign 8 blocks to 118 nodes requiring 278004 bytes.
[11/19/2024-09:38:36] [I] [TRT] Total Activation Memory: 278004
[11/19/2024-09:38:36] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +21, GPU +23, now: CPU 21, GPU 23 (MiB)
[11/19/2024-09:38:36] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 23281, GPU 2145 (MiB)
[11/19/2024-09:38:36] [I] [TRT] Loaded engine size: 24 MiB
[11/19/2024-09:38:36] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +22, now: CPU 0, GPU 22 (MiB)
[11/19/2024-09:38:37] [I] Engine built in 601.782 sec.
[11/19/2024-09:38:37] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +28, now: CPU 0, GPU 50 (MiB)
[11/19/2024-09:38:37] [I] Using random values for input images
[11/19/2024-09:38:37] [I] Created input binding for images with dimensions 1x3x0x0
[11/19/2024-09:38:37] [I] Using random values for output output0
[11/19/2024-09:38:37] [I] Created output binding for output0 with dimensions 1x6x8400
[11/19/2024-09:38:37] [I] Starting inference
[11/19/2024-09:38:40] [I] Warmup completed 13 queries over 200 ms
[11/19/2024-09:38:40] [I] Timing trace has 1107 queries over 3.00517 s
[11/19/2024-09:38:40] [I]
[11/19/2024-09:38:40] [I] === Trace details ===
[11/19/2024-09:38:40] [I] Trace averages of 10 runs:
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.37514 ms - Host latency: 2.78601 ms (end to end 2.84978 ms, enqueue 1.03041 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.37523 ms - Host latency: 2.78202 ms (end to end 2.83775 ms, enqueue 1.16537 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.34488 ms - Host latency: 2.74876 ms (end to end 2.81259 ms, enqueue 1.85841 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.33632 ms - Host latency: 2.73704 ms (end to end 2.8085 ms, enqueue 2.19702 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.33363 ms - Host latency: 2.73338 ms (end to end 2.80588 ms, enqueue 2.06551 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.37407 ms - Host latency: 2.77818 ms (end to end 2.84753 ms, enqueue 1.26068 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.41902 ms - Host latency: 2.84194 ms (end to end 2.92288 ms, enqueue 1.80482 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20438 ms - Host latency: 2.60343 ms (end to end 2.67046 ms, enqueue 0.8343 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.1965 ms - Host latency: 2.59559 ms (end to end 2.944 ms, enqueue 1.25407 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.192 ms - Host latency: 2.59377 ms (end to end 2.6501 ms, enqueue 1.21754 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.32359 ms - Host latency: 2.73068 ms (end to end 2.86141 ms, enqueue 1.09778 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22123 ms - Host latency: 2.62298 ms (end to end 2.68445 ms, enqueue 0.962424 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.2061 ms - Host latency: 2.60829 ms (end to end 2.2 ms, enqueue 0.847913 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20262 ms - Host latency: 2.6052 ms (end to end 2.65886 ms, enqueue 1.06005 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21492 ms - Host latency: 2.6154 ms (end to end 2.67204 ms, enqueue 0.839636 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20013 ms - Host latency: 2.60092 ms (end to end 2.65576 ms, enqueue 0.783594 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22026 ms - Host latency: 2.62197 ms (end to end 2.68032 ms, enqueue 0.732367 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22966 ms - Host latency: 2.63865 ms (end to end 2.70327 ms, enqueue 0.940442 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.2718 ms - Host latency: 2.67901 ms (end to end 2.73805 ms, enqueue 1.25109 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.24315 ms - Host latency: 2.673 ms (end to end 2.70248 ms, enqueue 1.03354 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.27304 ms - Host latency: 2.68297 ms (end to end 2.75499 ms, enqueue 1.27386 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21772 ms - Host latency: 2.62252 ms (end to end 2.68805 ms, enqueue 0.830072 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20799 ms - Host latency: 2.61246 ms (end to end 2.67567 ms, enqueue 1.09561 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.25542 ms - Host latency: 2.66599 ms (end to end 2.72869 ms, enqueue 1.01322 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20781 ms - Host latency: 2.60884 ms (end to end 2.67141 ms, enqueue 0.820795 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23478 ms - Host latency: 2.63728 ms (end to end 2.6996 ms, enqueue 1.47254 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.25377 ms - Host latency: 2.66746 ms (end to end 2.72952 ms, enqueue 1.51528 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20139 ms - Host latency: 2.60948 ms (end to end 2.67678 ms, enqueue 2.1343 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22547 ms - Host latency: 2.63062 ms (end to end 2.606 ms, enqueue 2.56366 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21494 ms - Host latency: 2.6187 ms (end to end 2.68221 ms, enqueue 1.97861 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23323 ms - Host latency: 2.133 ms (end to end 2.69825 ms, enqueue 0.955707 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.24156 ms - Host latency: 2.65712 ms (end to end 2.72356 ms, enqueue 1.68329 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.37323 ms - Host latency: 2.78247 ms (end to end 2.84999 ms, enqueue 1.10216 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23667 ms - Host latency: 2.451 ms (end to end 2.70339 ms, enqueue 1.10353 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.2163 ms - Host latency: 2.615 ms (end to end 2.68612 ms, enqueue 0.743958 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22607 ms - Host latency: 2.62639 ms (end to end 2.69211 ms, enqueue 0.874036 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21257 ms - Host latency: 2.6166 ms (end to end 2.682 ms, enqueue 0.783386 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21725 ms - Host latency: 2.61956 ms (end to end 2.68745 ms, enqueue 0.723022 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22936 ms - Host latency: 2.633 ms (end to end 2.70186 ms, enqueue 1.07371 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.27388 ms - Host latency: 2.69117 ms (end to end 2.75984 ms, enqueue 1.99712 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21926 ms - Host latency: 2.62754 ms (end to end 2.69531 ms, enqueue 1.94048 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20887 ms - Host latency: 2.61022 ms (end to end 2.67159 ms, enqueue 1.79271 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.1958 ms - Host latency: 2.59637 ms (end to end 2.67472 ms, enqueue 1.95419 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21533 ms - Host latency: 2.61736 ms (end to end 2.68129 ms, enqueue 0.95802 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.2158 ms - Host latency: 2.61495 ms (end to end 2.68833 ms, enqueue 0.8795 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.27722 ms - Host latency: 2.67921 ms (end to end 2.7561 ms, enqueue 0.623743 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.40466 ms - Host latency: 2.82246 ms (end to end 2.868 ms, enqueue 1.50081 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21658 ms - Host latency: 2.61815 ms (end to end 2.6865 ms, enqueue 0.3115 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21659 ms - Host latency: 2.61544 ms (end to end 2.69153 ms, enqueue 0.785315 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21423 ms - Host latency: 2.61316 ms (end to end 2.67601 ms, enqueue 0.686999 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.35272 ms - Host latency: 2.75914 ms (end to end 2.81121 ms, enqueue 1.95022 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22555 ms - Host latency: 2.62739 ms (end to end 2.68741 ms, enqueue 1.26312 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20781 ms - Host latency: 2.60785 ms (end to end 2.65967 ms, enqueue 0.679004 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20363 ms - Host latency: 2.60433 ms (end to end 2.65728 ms, enqueue 1.09255 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.38025 ms - Host latency: 2.79011 ms (end to end 2.8509 ms, enqueue 1.00428 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23491 ms - Host latency: 2.076 ms (end to end 2.69962 ms, enqueue 0.826013 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21575 ms - Host latency: 2.62632 ms (end to end 2.603 ms, enqueue 0.758252 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22073 ms - Host latency: 2.62013 ms (end to end 2.68002 ms, enqueue 0.900244 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19762 ms - Host latency: 2.59717 ms (end to end 2.66041 ms, enqueue 1.082 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.207 ms - Host latency: 2.60923 ms (end to end 2.67211 ms, enqueue 0.22 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19314 ms - Host latency: 2.59181 ms (end to end 2.65126 ms, enqueue 0.772888 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.1983 ms - Host latency: 2.59747 ms (end to end 2.65316 ms, enqueue 0.871545 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19854 ms - Host latency: 2.59746 ms (end to end 2.66071 ms, enqueue 0.781079 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23252 ms - Host latency: 2.055 ms (end to end 2.70936 ms, enqueue 1.5236 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.24872 ms - Host latency: 2.66118 ms (end to end 2.72096 ms, enqueue 2.06683 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22527 ms - Host latency: 2.63718 ms (end to end 2.69528 ms, enqueue 1.94172 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23433 ms - Host latency: 2.673 ms (end to end 2.70721 ms, enqueue 2.42386 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21915 ms - Host latency: 2.63241 ms (end to end 2.69805 ms, enqueue 2.37954 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20825 ms - Host latency: 2.61082 ms (end to end 2.672 ms, enqueue 2.003 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22244 ms - Host latency: 2.62493 ms (end to end 2.69065 ms, enqueue 0.870825 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22051 ms - Host latency: 2.6198 ms (end to end 2.67517 ms, enqueue 1.12122 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21799 ms - Host latency: 2.61785 ms (end to end 2.68459 ms, enqueue 0.906299 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20925 ms - Host latency: 2.60923 ms (end to end 2.66975 ms, enqueue 0.862622 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19744 ms - Host latency: 2.59736 ms (end to end 2.951 ms, enqueue 1.3802 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21428 ms - Host latency: 2.61387 ms (end to end 2.67393 ms, enqueue 0.730127 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.34199 ms - Host latency: 2.75671 ms (end to end 2.81523 ms, enqueue 1.74099 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.33865 ms - Host latency: 2.73948 ms (end to end 2.80386 ms, enqueue 0.978345 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.217 ms - Host latency: 2.61829 ms (end to end 2.67373 ms, enqueue 1.07065 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20825 ms - Host latency: 2.6072 ms (end to end 2.66833 ms, enqueue 1.11318 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.41487 ms - Host latency: 2.83743 ms (end to end 2.90784 ms, enqueue 1.46743 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19658 ms - Host latency: 2.59602 ms (end to end 2.6511 ms, enqueue 0.3286 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20381 ms - Host latency: 2.60366 ms (end to end 2.66296 ms, enqueue 0.799634 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.2488 ms - Host latency: 2.65151 ms (end to end 2.7052 ms, enqueue 0.844531 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19761 ms - Host latency: 2.595 ms (end to end 2.66904 ms, enqueue 0.952856 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20066 ms - Host latency: 2.60032 ms (end to end 2.66353 ms, enqueue 0.927979 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19768 ms - Host latency: 2.59651 ms (end to end 2.65693 ms, enqueue 0.659521 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19482 ms - Host latency: 2.59338 ms (end to end 2.65576 ms, enqueue 0.62771 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.19084 ms - Host latency: 2.582 ms (end to end 2.502 ms, enqueue 0.699707 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22783 ms - Host latency: 2.63142 ms (end to end 2.68738 ms, enqueue 0.948438 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.25149 ms - Host latency: 2.65942 ms (end to end 2.71458 ms, enqueue 1.033 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22576 ms - Host latency: 2.6332 ms (end to end 2.68833 ms, enqueue 1.35237 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.229 ms - Host latency: 2.63247 ms (end to end 2.69194 ms, enqueue 2.178 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.22483 ms - Host latency: 2.636 ms (end to end 2.69746 ms, enqueue 2.323 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.28494 ms - Host latency: 2.69541 ms (end to end 2.77397 ms, enqueue 2.1719 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.23406 ms - Host latency: 2.231 ms (end to end 2.69939 ms, enqueue 1.22158 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21638 ms - Host latency: 2.62256 ms (end to end 2.69639 ms, enqueue 1.09102 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.2179 ms - Host latency: 2.61953 ms (end to end 2.653 ms, enqueue 0.73269 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21035 ms - Host latency: 2.61404 ms (end to end 2.6834 ms, enqueue 0.9316 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21914 ms - Host latency: 2.62859 ms (end to end 2.69116 ms, enqueue 1.0469 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.33323 ms - Host latency: 2.73506 ms (end to end 2.7928 ms, enqueue 0.781934 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.29727 ms - Host latency: 2.70479 ms (end to end 2.78074 ms, enqueue 0.613599 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.26187 ms - Host latency: 2.66829 ms (end to end 2.73794 ms, enqueue 1.35955 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.21106 ms - Host latency: 2.60984 ms (end to end 2.679 ms, enqueue 0.787939 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.33667 ms - Host latency: 2.75 ms (end to end 2.82932 ms, enqueue 0.815527 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.298 ms - Host latency: 2.70029 ms (end to end 2.7814 ms, enqueue 0.67207 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20793 ms - Host latency: 2.60969 ms (end to end 2.69075 ms, enqueue 1.29441 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20686 ms - Host latency: 2.60588 ms (end to end 2.68108 ms, enqueue 0.634668 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20708 ms - Host latency: 2.60598 ms (end to end 2.6748 ms, enqueue 0.635522 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.20188 ms - Host latency: 2.60271 ms (end to end 2.66741 ms, enqueue 0.706299 ms)
[11/19/2024-09:38:40] [I] Average on 10 runs - GPU latency: 2.26147 ms - Host latency: 2.66597 ms (end to end 2.74114 ms, enqueue 0.734106 ms)
[11/19/2024-09:38:40] [I]
[11/19/2024-09:38:40] [I] === Performance summary ===
[11/19/2024-09:38:40] [I] Throughput: 368.365 qps
[11/19/2024-09:38:40] [I] Latency: min = 2.54675 ms, max = 4.16345 ms, mean = 2.818 ms, median = 2.61597 ms, percentile(99%) = 3.17688 ms
[11/19/2024-09:38:40] [I] End-to-End Host Latency: min = 2.59686 ms, max = 4.2196 ms, mean = 2.71256 ms, median = 2.6814 ms, percentile(99%) = 3.27051 ms
[11/19/2024-09:38:40] [I] Enqueue Time: min = 0.549805 ms, max = 6.67908 ms, mean = 1.17696 ms, median = 0.797852 ms, percentile(99%) = 3.27966 ms
[11/19/2024-09:38:40] [I] H2D Latency: min = 0.37915 ms, max = 0.4208 ms, mean = 0.384559 ms, median = 0.380859 ms, percentile(99%) = 0.414551 ms
[11/19/2024-09:38:40] [I] GPU Compute Time: min = 2.14734 ms, max = 3.72131 ms, mean = 2.24363 ms, median = 2.21387 ms, percentile(99%) = 2.74841 ms
[11/19/2024-09:38:40] [I] D2H Latency: min = 0.0183105 ms, max = 0.0762329 ms, mean = 0.0199879 ms, median = 0.0187988 ms, percentile(99%) = 0.0490723 ms
[11/19/2024-09:38:40] [I] Total Host Walltime: 3.00517 s
[11/19/2024-09:38:40] [I] Total GPU Compute Time: 2.4837 s
[11/19/2024-09:38:40] [W] * GPU compute time is unstable, with coefficient of variance = 5.25055%.
[11/19/2024-09:38:40] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[11/19/2024-09:38:40] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/19/2024-09:38:40] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8400] # C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\trtexec.exe --onnx=drone_best.onnx --saveEngine=drone_best_16.engine --fp16

TensorRT报错： .onnx格式模型有误

import onnx
    model = onnx.load("test_delete/drone_best.onnx")
    onnx.checker.check_model(model)

可通过上面的代码加载和验证一个 ONNX 模型的有效性，如果模型有效，check_model 会直接返回，没有任何输出，就表示模型是有效的。

3.使用TensorRT对YOLOv8对象检测模型进行部署

3.1 源码

import tensorrt as trt
from torchvision import transforms
import torch as t
from collections import OrderedDict, namedtuple
import cv2 as cv
import time
import numpy as np

img_transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Resize((0, 0))
                                    ])

def load_classes():
    with open("test_delete/uva_names.txt", "r") as f:
        class_list = [cname.strip() for cname in f.readlines()]
    return class_list


def format_yolov8(frame):
    row, col, _ = frame.shape
    _max = max(col, row)
    result = np.zeros((_max, _max, 3), np.uint8)
    result[0:row, 0:col] = frame
    result = cv.cvtColor(result, cv.COLOR_BGR2RGB)
    return result


def wrap_detection(input_image, output_data):
    class_ids = []
    confidences = []
    boxes = []
    out_data = output_data.T
    rows = out_data.shape[0]

    image_width, image_height, _ = input_image.shape

    x_factor = image_width / 0.0
    y_factor = image_height / 0.0

    for r in range(rows):
        row = out_data[r]
        classes_scores = row[4:]
        class_id = np.argmax(classes_scores)
        if (classes_scores[class_id] > .25):
            class_ids.append(class_id)
            confidences.append(classes_scores[class_id])
            x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item()
            left = int((x - 0.5 * w) * x_factor)
            top = int((y - 0.5 * h) * y_factor)
            width = int(w * x_factor)
            height = int(h * y_factor)
            box = np.array([left, top, width, height])
            boxes.append(box)

    indexes = cv.dnn.NMSBoxes(boxes, confidences, 0.25, 0.25)

    result_class_ids = []
    result_confidences = []
    result_boxes = []

    for i in indexes:
        result_confidences.append(confidences[i])
        result_class_ids.append(class_ids[i])
        result_boxes.append(boxes[i])

    return result_class_ids, result_confidences, result_boxes


def gpu_trt_demo():
    class_list = load_classes()
    device = t.device('cuda:0')
    Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
    logger = trt.Logger(trt.Logger.INFO)
    with open("test_delete/drone_best_16.engine", 'rb') as f, trt.Runtime(logger) as runtime:
        model = runtime.deserialize_cuda_engine(f.read())
    bindings = OrderedDict()
    for index in range(model.num_bindings):
        name = model.get_binding_name(index)
        dtype = trt.nptype(model.get_binding_dtype(index))
        shape = model.get_binding_shape(index)
        data = t.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to(device)
        bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
    binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())
    context = model.create_execution_context()

    capture = cv.VideoCapture("test_delete/drone.mp4")
    colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]
    while True:
        _, frame = capture.read()
        if frame is None:
            print("End of stream")
            break
        fh, fw, fc = frame.shape
        start = time.time()
        image = format_yolov8(frame)
        x_input = img_transform(image).view(1, 3, 0, 0).to(device)
        binding_addrs['images'] = int(x_input.data_ptr())
        context.execute_v2(list(binding_addrs.values()))
        out_prob = bindings['output0'].data.cpu().numpy()
        end = time.time()

        class_ids, confidences, boxes = wrap_detection(image, np.squeeze(out_prob, 0))
        for (classid, confidence, box) in zip(class_ids, confidences, boxes):
            if box[2] > fw * 0.67:
                continue
            color = colors[int(classid) % len(colors)]
            cv.rectangle(frame, box, color, 2)
            cv.rectangle(frame, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
            cv.putText(frame, class_list[classid] + " " + ("%.2f"%confidence), (box[0], box[1] - 10), cv.FONT_HERSHEY_SIMPLEX, .5, (0, 0, 0))

        inf_end = end - start
        fps = 1 / inf_end
        fps_label = "FPS: %.2f" % fps
        cv.putText(frame, fps_label, (10, 25), cv.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        cv.imshow("YOLOv8 + TensorRT8.4.x Object Detection", frame)
        cc = cv.waitKey(1)
        if cc == 27:
            break
    cv.waitKey(0)
    cv.destroyAllWindows()


if __name__ == "__main__":
    gpu_trt_demo()

3.2 运行报错：AttributeError: module ‘numpy’ has no attribute ‘bool’.

D:\Python3.8\lib\site-packages\tensorrt\__init__.py:329: FutureWarning: In the future `np.bool` will be defined as the corresponding NumPy scalar.
  bool: np.bool,
Traceback (most recent call last):
  File "F:/python_project/realsense435/000test.py", line 122, in <module>
    gpu_trt_demo()
  File "F:/python_project/realsense435/000test.py", line 77, in gpu_trt_demo
    dtype = trt.nptype(model.get_binding_dtype(index))
  File "D:\Python3.8\lib\site-packages\tensorrt\__init__.py", line 329, in nptype
    bool: np.bool,
  File "D:\Python3.8\lib\site-packages\numpy\__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

这是导致程序终止的错误。它的意思是 np.bool 在当前版本的 NumPy 中已经被删除或者不再支持。NumPy 从版本 1.20 开始，弃用了 np.bool，并建议使用 bool 或者 np.bool_ 来替代。

解决方法：降低numpy版本

pip install numpy==1.19.5 -i https://pypi.tuna.tsinghua.edu.cn/simple

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文