Quantcast
Channel: Intel® Software - Media
Viewing all articles
Browse latest Browse all 2185

Squeezing best performance from H264 encode/decode.

$
0
0

My configuration is as follows:

    Graphics Devices:
        Name                                         Version             State
        AMD Radeon HD 7900 Series                    16.150.2211.0       Active
        Intel(R) HD Graphics 4600                    10.18.15.4256       08

    System info:
        CPU:    Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
        OS:    Microsoft Windows 10 Pro
        Arch:    64-bit

With a 1.16 session (Intel(R)_Media_SDK_2016.0.2), using the following parameters to encode H264:

    parms.AsyncDepth = 4;
    parms.IOPattern = MFX_IOPATTERN_IN_VIDEO_MEMORY;
    parms.mfx.CodecId = MFX_CODEC_AVC;
    parms.mfx.CodecProfile = MFX_PROFILE_AVC_MAIN;
    parms.mfx.EncodedOrder = 0;
    parms.mfx.FrameInfo.FourCC = MFX_FOURCC_NV12;
    parms.mfx.FrameInfo.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
    parms.mfx.FrameInfo.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
    parms.mfx.FrameInfo.Width = 1280;
    parms.mfx.FrameInfo.Height = 720;
    parms.mfx.FrameInfo.CropX = 0;
    parms.mfx.FrameInfo.CropY = 0;
    parms.mfx.FrameInfo.CropW = 1280;
    parms.mfx.FrameInfo.CropH = 720;
    parms.mfx.GopRefDist = 3;
    parms.mfx.GopPicSize = 60;
    parms.mfx.IdrInterval = 0;
    parms.mfx.NumRefFrame = 1;
    parms.mfx.NumSlice = 0;
    parms.mfx.RateControlMethod = MFX_RATECONTROL_CBR;
    parms.mfx.TargetUsage =  MFX_TARGETUSAGE_BALANCED;
    parms.mfx.TargetKbps = 5000;

, using a D3D11FrameAllocator and MFX_IMPL_HARDWARE2 (my main graphics device is the AMD), with MFX_IMPL_HARDWARE_ANY | MFX_IMPL_VIA_D3D11.  

My encode speed was 179ms per GOP (60 frames, 1280 x 720, 5000kbps), which is around 3ms per frame.  My decode speed was 197ms per GOP which rounds to 3ms so let's call them the same.  Approximately 333 frames per second.

How do these numbers compare to theoretical maximums?  We are wanting to encode 2 streams of 1280 x 720 at 60 fps and decode 2, 3 or 4 (or more) streams simultaneously.  We have our own pipeline that further processes decoded GOPs for a scientific/industrial application, so we aren't using VPP.  Apart from TargetUsage, is there any other way of squeezing more performance out of the encoder or decoder?  I noticed when profiling that the vast majority of processor time is taken locking the D3D 11 surfaces (> 85%).  Can we speed this up in any way?  For example, is there optimised code knocking around for converting UYUV420 to NV12 (and back again) - our base format is UYUV420.  I haven't been able to find anything suitable on google.

Note the numbers I have here compare favourably with the output of sample_encode with the -calc_latency flag, so I'm thinking my impl is close to optimal, (assuming yours is!).

Thanks for any advice you can give me.


Viewing all articles
Browse latest Browse all 2185

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>