This project aims to demonstrate a few alternative ways to utilize a Pytorch detection model and compare their performance. To this end, three equivalent pipelines were implemented:
- Pytorch pipeline that receives its input from OpenCV VideoCapture in a Numpy array (host memory);
- Pytorch pipeline that receives its input from Torchaudio StreamReader with hardware-accelerated video decoder in a GPU Torch tensor (device memory);
- Savant pipeline, based on NVIDIA Deepstream+TensorRT.
Common pipeline inference parameters:
- GPU inference
- 640x640 inference dimensions
- 1 batch size
- fp16 mode
Benchmark pipelines are run in Docker containers.
Build the Pytorch container by running:
make build-pytorchPull the Savant container by running:
make pull-savantBenchmark pipelines use an h264 video as input. Download it by running
make get-test-videoCheck that data/deepstream_sample_720p.mp4 file exists.
Pytorch pipelines use YOLOv8m model from ultralytics. Download the weights by running:
make get-pytorch-modelCheck that pytorch_weights/yolov8m.pt file exists.
Savant pipeline uses the same model exported to ONNX format. Run the export with:
make run-export-onnxCheck that cache/models/yolov8m_pipeline/yolov8m/yolov8m.onnx file exists.
Run the OpenCV VideoCapture version of the pipeline with:
make run-pytorch-opencvRun the Torchaudio + HW decoder version of the pipeline with:
make run-pytorch-hw-decodeRun the Savant version of the pipeline with:
make run-savant| Test | FPS |
|---|---|
| Pytorch OpenCV | 75 |
| Pytorch HW Decode | 107 |
| Savant | 255 |
Hardware used:
| GPU | CPU | RAM, Gi |
|---|---|---|
| GeForce RTX 2080 | Intel Core i5-8600K CPU @ 3.60GHz | 31 |