We are using ffmpeg to capture live video from a Blackmagic DeckLink Mini Recorder and encode to multiple h264 outputs (streamed over RTMP and written to disk). We're running CentOS 7.2 with Media Server Studio 2017 on a Core i7-6700K.
We were very impressed with h264_qsv performance when we did multiple transcodes of a 1920x1080 MP4 file -- we were able to do 4 independent transcodes to 7 different locations (3 streams, 4 files) and it all ran in 3x real time. And the CPU impact was minimal (maybe 10-15%).
Based on this performance, we thought that it would be trivial to do real-time encoding of HD video.
Unfortunately, we have found that the 1 GB frame buffer will fill up, causing frames to drop. Sometimes it takes an hour before this behavior happens, and sometimes it only takes a few minutes. But it seems to always happen.
Here is a sample invocation:
https://gist.github.com/jpriebe/9acdc3beb50547449bd47f4fb46de214
(normally we would "tee" the low/medium/high encodings to an RTMP streaming server. I have removed that to make the example a little simpler)
We have tried everything to streamline/speed up the QuickSync encoding:
- use veryfast preset
- fix the min/max bitrates to use CBR
- force the GPU clock to 1150MHz
No matter what settings we use, the buffer will overrun eventually. The strange thing is that the GPU load (as measured by the metrics_monitor utility) is less than 20% and clock speed is confirmed to be at 1150MHz. So it doesn't seem like the GPU is overloaded.
We tried using libx264 for some of the encodings to the CPU, to no avail. If even one of the four encodings is done on the GPU, we will get an overrun.
By comparison, if we run all 4 encodings on the CPU, we can run indefinitely with no buffer overrun.
It really seems like there is some sort of bottleneck in the h264_qsv encoding. I don't know if it's a problem in ffmpeg, with QuickSync itself, or just in my understanding of how it all works.
Any insight would be much appreciated. Thanks!