How to optimize multicore encoding

Questions involving a Windows version of FFmpeg.
marcjn
Posts: 46
Joined: Tue Jun 21, 2011 12:36 am

How to optimize multicore encoding

Post by marcjn » Thu Jul 12, 2012 11:25 pm

I am planning to invest in a fast multicore processor (maybe AMD Phenom II X6 1100T) hoping to cut the time spent on encoding videos with ffmpeg. Before I do that, I want to verify that the benefit is worth the cost.
I am getting confused as I google the multi-thread / multicore issue. It looks like '--enable-pthreads' is needed, but the version I am running now, and I believe the other Zeranoe static builds, don't seem to have it, or at least, don't seem to show it anywhere.
However, when I run my ffmpeg build on a 4-core laptop:

Code: Select all

"C:\Program Files (x86)\ffmpeg_f514695\ffmpeg-git-f514695
-win32-static\bin\ffmpeg.exe" -i "Kat en duif.wmv" -threads 4 -vcodec libx264 -c
rf 22 -vpre hq -y -f mp4  -acodec ac3 -ac 2 -ar 48000 -b:a 128k Kat.mp4
ffmpeg version N-36193-gf514695, Copyright (c) 2000-2011 the FFmpeg developers
  built on Dec 26 2011 17:50:37 with gcc 4.6.2
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-ru
ntime-cpudetect --enable-avisynth --enable-bzlib --enable-frei0r --enable-libope
ncore-amrnb --enable-libopencore-amrwb --enable-libfreetype --enable-libgsm --en
able-libmp3lame --enable-libopenjpeg --enable-librtmp --enable-libschroedinger -
-enable-libspeex --enable-libtheora --enable-libvo-aacenc --enable-libvo-amrwben
c --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-
libxvid --enable-zlib
  libavutil      51. 33.100 / 51. 33.100
  libavcodec     53. 48.100 / 53. 48.100
  libavformat    53. 28.100 / 53. 28.100
  libavdevice    53.  4.100 / 53.  4.100
  libavfilter     2. 54.100 /  2. 54.100
  libswscale      2.  1.100 /  2.  1.100
  libswresample   0.  5.100 /  0.  5.100
  libpostproc    51.  2.100 / 51.  2.100
[asf @ 003ABA00] max_analyze_duration 5000000 reached at 5194000
all the processors seem to be running at equal load.

My question is: is it possible that the multi-thread capability is somewhat embedded in the build and I just don't see it, or should I think about compiling my own build with --enable-pthreads?

rogerdpack
Posts: 1878
Joined: Fri Aug 05, 2011 9:56 pm

Re: How to optimize multicore encoding

Post by rogerdpack » Fri Jul 13, 2012 12:23 am

"some" video codecs can use multiple threads (thankfully, libx264 is one of them, and the most popular). I assume from your statement "seem to running at equal load" that it's using all available [real] cores at 100%? If so then you're good, and threading is enabled. The default zeranoe builds I believe have pthreads enabled. Interesting that you should ask, however, as there has been some discussion about whether disabling pthreads and enabling "w32threads" is the way to go (zeranoes use pthreads).

Also note that the same thread seemed to imply that 64 bit builds were slightly faster than 32 bit builds, as well.

http://ffmpeg.org/pipermail/ffmpeg-user ... 07976.html

GL!
-r

marcjn
Posts: 46
Joined: Tue Jun 21, 2011 12:36 am

Re: How to optimize multicore encoding

Post by marcjn » Sat Jul 14, 2012 3:56 am

I am somewhat uncertain about how this works, and I am doing some simple tests, which I will describe here. In general, my processors don't run at 100%. Only with libx264 and with '-threads 0' or no threads option does it get close to 100%. With mpeg4, the fastest codec by far, CPU use reaches above 55% only with '-threads 6' and '-threads 0'.

Regarding pthreads and w32threads, I have read that if you use one, you should disable the other one, and I have noticed that the zeranoe standard build does disable w32threads. But I don't know how to change that anyway without re-compiling, which for me is not an easy project in windows.

At this point, it looks like the command '-threads n' seems to work differently with libx264, libxvid and mpeg4 codecs when I am transcoding video (a wmv clip in my test).

Some preliminary observations on a 4-core machine:
- With libx264, the best outcome (in terms of speed -- frames per second) is with -threads 0 (supposed to find how many threads are available), or no thread option.
- With libxvid, the best outcomes are -thread 6 (still on a 4-core machine) and -threads 4, and the worst is -threads 0 or no threads option.
- With mpeg4, -threads 6 and -threads 0 are the best, and it does outperform libxvid and libx264 by far.
Conclusion so far: libx264 and mpeg4 seem to respond the -threads option, but not libxvid, which is really slow anyway. More detailed results to come.

One of my issues is that I don't see what is supported by the codecs, and enabled/disabled by the ffmpeg build. I wish someone knowledgeable could shed some light on this question.

rogerdpack
Posts: 1878
Joined: Fri Aug 05, 2011 9:56 pm

Re: How to optimize multicore encoding

Post by rogerdpack » Sat Jul 14, 2012 5:17 am

There has been some discussion about difficulty with pthreads and libx264 for some reason. You could try this build and see how it compares:
http://x32.elijst.nl/FFmpeg-20120622.7z

It would be interesting to compare console output for mp4 versus libx264, it's possible that one is giving poorer quality which would help explain why it's faster.

It's also odd to me that it doesn't use all cores. Hmm...

Seems like at the least -threads 0 for libxvid is broken :P

> One of my issues is that I don't see what is supported by the codecs,

For instance? What are you trying to see?

-r

marcjn
Posts: 46
Joined: Tue Jun 21, 2011 12:36 am

Re: How to optimize multicore encoding

Post by marcjn » Sat Jul 14, 2012 6:27 pm

You could try this build and see how it compares:
http://x32.elijst.nl/FFmpeg-20120622.7z
I will definitely try it. I assume it's a windows build.
It would be interesting to compare console output for mp4 versus libx264, it's possible that one is giving poorer quality which would help explain why it's faster.
I am redoing my tests with a mpeg2 test clip that should be easier to compare for quality with the compressed versions.
> One of my issues is that I don't see what is supported by the codecs,
For instance? What are you trying to see?
I just want to know what multithreads functionality the codecs have or don't have, independently what the ffmpeg build supports.

j

rogerdpack
Posts: 1878
Joined: Fri Aug 05, 2011 9:56 pm

Re: How to optimize multicore encoding

Post by rogerdpack » Mon Jul 16, 2012 7:43 am

marcjn wrote:I just want to know what multithreads functionality the codecs have or don't have, independently what the ffmpeg build supports.
Interestingly, recently a few columns were added to the "ffmpeg -codecs" output that might be of interest to you. They describe what threading support type the various codecs support (you'll need a fairly recent version of ffmpeg to get this output).

-r

marcjn
Posts: 46
Joined: Tue Jun 21, 2011 12:36 am

Re: How to optimize multicore encoding

Post by marcjn » Wed Jul 18, 2012 1:27 am

Interestingly, recently a few columns were added to the "ffmpeg -codecs" output that might be of interest to you.
Thanks for the tip. I will definitely download the last version. Mine (December 2011) doesn't have that information.
It would be interesting to compare console output for mp4 versus libx264, it's possible that one is giving poorer quality which would help explain why it's faster.
I agree quality is a parameter I also considered. I am not sure how you can compare quality based on the console output, but I can post some of them if you would like to look at them. What I do to compare quality is to run the ssim filter in avisynth comparing the original with the compressed version. ssim = 100 means identical.
I have been a little slow putting my tests and my thoughts together, but here it is.
Here is a summary of the tests, where I show, for the various codecs:
- the value 'n' of the '-threads n' option,
- the cpu use, as seen (approximately) on Windows Task Manager,
- the number of frames processed per second (fps) by the codec,
- a quality option: constant rate factor (crf) for libx264, the encoding bitrate in kbps for xvid,
- and the ssim ratio in % calculated with the avisynth SSIM() function, which performs a subjective comparison between the original clip, and the compressed version.
Tests run on Intel Core2 Quad Q9000 @2.00GHz, Win7. Using ffmpeg version N-36193-gf514695, Copyright (c) 2000-2011 the FFmpeg developers built on Dec 26 2011 17:50:37 with gcc 4.6.2 (Zeranoe static build)
Note that cpu usage percentage takes the average of the four processors: for example, 24% usage may mean one processor at 96% and three processors at 0%.

libx264 (four processors always working at similar load)
-------
-threads 6, cpu use: ~93%, fps=54 crf=22 ssim=77.85
-threads 4, cpu use: ~85%, fps=44 crf=22 ssim=77.87
-threads 2, cpu use: ~52%, fps=32 crf=22 ssim=77.84
-threads 0, cpu use: ~98%, fps=54 crf=22 ssim=77.85
-no thread cpu use: ~98%, fps=54 crf=22 ssim=77.85

-threads 4, cpu use: ~78%, fps=55 crf=24 ssim=75.21
-threads 4, cpu use: ~71%, fps=65 crf=26 ssim=73.40

libxvid
-------
-threads 6, cpu use: ~83%, fps= 9 bitrate = 650K ssim=72.14 (same load for each cpu)
-threads 4, cpu use: ~78%, fps=11 bitrate = 650K ssim=72.14 (same load for each cpu)
-threads 2, cpu use: ~52%, fps= 7 bitrate = 650K ssim=72.14 (cpu 3 ~80%)
-threads 0, cpu use: ~24%, fps= 4 bitrate = 650K ssim=72.14 (cpu 2-3 ~40%)
-no thread cpu use: ~24%, fps= 4 bitrate = 650K ssim=72.14 (cpu 3 ~75%)

mpeg4
-----
-threads 6, cpu use: ~57%, fps=106 bitrate = 650K ssim=72.04 (similar load)
-threads 4, cpu use: ~70%, fps=109 bitrate = 650K ssim=72.09 (similar load)
-threads 2, cpu use: ~45%, fps= 79 bitrate = 650K ssim=72.09 (cpus 1-2 ~60%)
-threads 0, cpu use: ~53%, fps=102 bitrate = 650K ssim=72.05 (similar load)
-no thread, cpu use: ~24%, fps= 50 bitrate = 650K ssim=72.05 (cpu 2 ~60%)

-threads 4, cpu use: ~63%, fps= 92 bitrate =1000K ssim=72.99(similar load)
-threads 4, cpu use: ~69%, fps= 70 bitrate =2000K ssim=73.35(similar load)

Regarding the effect of the '-threads n' option, it really depends on the codec used:
  • - libx264 works faster with a number of threads greater than the number of processors, or with 0 or no -threads option. So, the worst case is to put '-threads n' with n <= number of processors. Better not use the -threads option.
    - On the other hand, libxvid runs best with n = number of processors, but this codec is VERY SLOW.
    - mpeg4 for xvid is very fast, and also works best with n = number of processors
As far as quality is concerned: If I reduce the quality (by increasing crf to 24 and 26) of the libx264 encoding to bring it closer to libxvid or mpeg4, the frames per seconds go up and the cpu usage go down (seems normal, less processing needed). If I increase the quality (higher bitrate) for the mpeg4 encoding, the fps goes down (seems normal) and the cpu stays the same or goes down.

rogerdpack
Posts: 1878
Joined: Fri Aug 05, 2011 9:56 pm

Re: How to optimize multicore encoding

Post by rogerdpack » Sat Jul 21, 2012 10:05 pm

did you try the win32threads builds I mentioned, which may have a faster libx264? I presume the output files are about the same size for the tests mentioned?

rogerdpack
Posts: 1878
Joined: Fri Aug 05, 2011 9:56 pm

Re: How to optimize multicore encoding

Post by rogerdpack » Tue Jul 31, 2012 8:14 pm

interesting that for your mpeg-4 had such good SSIM.
http://x264dev.multimedia.cx/wp-content ... chart1.png doesn't seem to show the same...
http://x264dev.multimedia.cx/archives/102
but maybe it is using different data?

marcjn
Posts: 46
Joined: Tue Jun 21, 2011 12:36 am

Re: How to optimize multicore encoding

Post by marcjn » Sun Aug 05, 2012 3:18 am

Well, sorry it's taking me so long.
I've downloaded this build you suggested (http://ffmpeg.zeranoe.com/builds/win32/ ... -static.7z), and had all kind of problems, first setting it up on my Window7 PC, and then realizing that it didn't recognize the ffmpeg presets I was using, only the x264 presets... so I have to learn what is more or less equivalent to what I had selected with ffmpeg settings. Then, if I want to do a comparison with the previous tests, I am in trouble, as I can't reproduce the exact same options, at least for x264. However, I hope I can keep the same ones for XVID. I will let you know if it's faster.

Regarding the mpeg4 vs xvid, maybe I should have been more explicit. The "Diary of an X.264 developer" is showing different results between mpeg4 and xvid because he is using different settings. In my tests, I find similar results: I used the same settings, changing only the encoder (mpeg4 or libxvid). The quality should stay more or less the same, only the performance changes. Here are the full commands respectively for mpeg4 and for libxvid:
ffmpeg.exe -i gps.m2v -threads 6 -vcodec mpeg4 -vtag XVID -b:v 650.0K -g 240 -trellis 2 -mbd rd -flags +mv4+aic -y -an gps-mpeg6.avi
ffmpeg.exe -i gps.m2v -threads 6 -vcodec libxvid -vtag XVID -b:v 650.0K -g 240 -trellis 2 -mbd rd -flags +mv4+aic -y -an gps-libxvid6.avi
I am planning to redo the tests with this new build.

Post Reply