Squeezing best performance from H264 encode/decode.

My configuration is as follows:

   Graphics Devices:
       Name Version State
       AMD Radeon HD 7900 Series 16.150.2211.0 Active
       Intel(R) HD Graphics 4600 10.18.15.4256 08
   System info:
       CPU:   Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
       OS:   Microsoft Windows 10 Pro
       Arch:   64-bit

With a 1.16 session (Intel(R)_Media_SDK_2016.0.2), using the following parameters to encode H264:

    parms.AsyncDepth = 4;
    parms.IOPattern = MFX_IOPATTERN_IN_VIDEO_MEMORY;
    parms.mfx.CodecId = MFX_CODEC_AVC;
    parms.mfx.CodecProfile = MFX_PROFILE_AVC_MAIN;
    parms.mfx.EncodedOrder = 0;
    parms.mfx.FrameInfo.FourCC = MFX_FOURCC_NV12;
    parms.mfx.FrameInfo.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
    parms.mfx.FrameInfo.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
    parms.mfx.FrameInfo.Width = 1280;
    parms.mfx.FrameInfo.Height = 720;
    parms.mfx.FrameInfo.CropX = 0;
    parms.mfx.FrameInfo.CropY = 0;
    parms.mfx.FrameInfo.CropW = 1280;
    parms.mfx.FrameInfo.CropH = 720;
    parms.mfx.GopRefDist = 3;
    parms.mfx.GopPicSize = 60;
    parms.mfx.IdrInterval = 0;
    parms.mfx.NumRefFrame = 1;
    parms.mfx.NumSlice = 0;
    parms.mfx.RateControlMethod = MFX_RATECONTROL_CBR;
    parms.mfx.TargetUsage =  MFX_TARGETUSAGE_BALANCED;
    parms.mfx.TargetKbps = 5000;

, using a D3D11FrameAllocator and MFX_IMPL_HARDWARE2 (my main graphics device is the AMD), with MFX_IMPL_HARDWARE_ANY | MFX_IMPL_VIA_D3D11.

My encode speed was 179ms per GOP (60 frames, 1280 x 720, 5000kbps), which is around 3ms per frame. My decode speed was 197ms per GOP which rounds to 3ms so let's call them the same. Approximately 333 frames per second.

How do these numbers compare to theoretical maximums? We are wanting to encode 2 streams of 1280 x 720 at 60 fps and decode 2, 3 or 4 (or more) streams simultaneously. We have our own pipeline that further processes decoded GOPs for a scientific/industrial application, so we aren't using VPP. Apart from TargetUsage, is there any other way of squeezing more performance out of the encoder or decoder? I noticed when profiling that the vast majority of processor time is taken locking the D3D 11 surfaces (> 85%). Can we speed this up in any way? For example, is there optimised code knocking around for converting UYUV420 to NV12 (and back again) - our base format is UYUV420. I haven't been able to find anything suitable on google.

Note the numbers I have here compare favourably with the output of sample_encode with the -calc_latency flag, so I'm thinking my impl is close to optimal, (assuming yours is!).

Thanks for any advice you can give me.

Squeezing best performance from H264 encode/decode.

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112