cross-compiling updated cuda filters ?

Questions that occur when trying to compile FFmpeg.
Post Reply
hydra3333
Posts: 201
Joined: Sun Apr 28, 2013 1:03 pm
Contact:

cross-compiling updated cuda filters ?

Post by hydra3333 » Thu Feb 21, 2019 8:55 am

Hello.

This thread
http://ffmpeg.org/pipermail/ffmpeg-deve ... 40328.html
seems to talk about moving compilation of some CUDA related items to another mechanism, eg some filters.
compiling cuda kernels to the ptx format does not
introduce any non-free dependencies - the ptx files are an intermediate
assembly code format that is actually compiled to binary form at
runtime. With that understood, we just need to switch the remaining
users of the CUDA SDK to ffnvcodec and we will remove the non-free
entanglements from cuda.
and
The use of nvcc to compile cuda kernels is distinct from the use of
cuda sdk libraries and linking against those libraries. We have
previously not bothered to distinguish these two cases because all
the filters that used cuda kernels also used the sdk. In the following
changes, I'm going to remove the sdk dependency from those filters,
but we need a way to ensure that nvcc is present and functioning, and
also a way to explicitly disable its use so that the filters are not
built.

Note that, unlike the cuda_sdk dependency, using nvcc to compile
a kernel does not cause a build to become non-free. Although nvcc
is distributed with the cuda sdk, and is EULA encumbered, the
compilation process we use does not introduce any EULA covered
code or libraries into the build. In this sense, using nvcc is just
like using any other proprietary compiler like msvc - compiling free
code doesn't suddently make it non-free.

There was previously some confusion on this topic, but the important
distinction is that we use nvcc to generate ptx files - these are
not compiled GPU binaries, but rather an intermediate assembly
representation that is JIT compiled (and I think linked with certain
nvidia library code) when you actually try and run the kernel. nvidia
use this technique to relax machine code compatibility between
hardware generations.

>From here, we can make two observations:
* The ptx files that we include in libavfilter are aggregated rather
than linked, from the perspective of the (L)GPL
* No proprietary code is included with the ptx files. That code is
only linked in at the final compilation step at runtime.
So, for those of us that cross-compile ffmpeg but do not (yet) understand what is additionally needed to "cross compile" these filters, eg
we need a way to ensure that nvcc is present and functioning
Does anyone know if it is possible to cross-compile the updated filters by somehow adding nvcc erg under ubuntu ?
And, if one would be so kind, how ?
And, I hope finally, what if any files need to be copied before or during or after cross-compilation ?

hydra3333
Posts: 201
Joined: Sun Apr 28, 2013 1:03 pm
Contact:

Re: cross-compiling updated cuda filters ?

Post by hydra3333 » Thu Feb 21, 2019 9:05 am

Ah.

This https://docs.nvidia.com/cuda/cuda-compi ... index.html says
All non-CUDA compilation steps are forwarded to a C++ host compiler that is supported by nvcc, and nvcc translates its options to appropriate host compiler command line options.
And this https://docs.nvidia.com/cuda/cuda-insta ... index.html seems to indicate
CUDA 10, Ubuntu 18.04.1, GCC 7.3.0.
And this https://docs.nvidia.com/cuda/cuda-insta ... s-platform seems to indicate cross-compilation is impossible ?

hydra3333
Posts: 201
Joined: Sun Apr 28, 2013 1:03 pm
Contact:

Re: cross-compiling updated cuda filters ?

Post by hydra3333 » Thu Feb 28, 2019 7:47 am

Well well ... a smart cookie seems to have gotten it to cross-compile ... in an ubuntu vm, too ...

https://github.com/DeadSix27/python_cro ... -467827333

edit: and tickle me with a feather duster, it builds :)

hydra3333
Posts: 201
Joined: Sun Apr 28, 2013 1:03 pm
Contact:

Re: cross-compiling updated cuda filters ?

Post by hydra3333 » Sat Mar 02, 2019 6:57 am

It seems ffmpeg is cross-compilable (non-free, using the nvidia CUDA toolkit) with the now-inbuilt GPU based YADIF_CUDA deinterlacer mentioned in the nvidia forum here: https://devtalk.nvidia.com/default/topi ... deint-2-/2
ffmpeg documentation here https://ffmpeg.org/ffmpeg-filters.html#yadif_005fcuda

Has anyone done any comparative speed tests eg vs vanilla YADIF ?
eg with mode=0,parity=-1,deint=0

hydra3333
Posts: 201
Joined: Sun Apr 28, 2013 1:03 pm
Contact:

Re: cross-compiling updated cuda filters ?

Post by hydra3333 » Mon Mar 04, 2019 9:49 am

well, yes, me, now.

https://forum.videohelp.com/threads/392 ... ost2544357

1. vanilla yadif followed by unsharp_opencl
"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel warning -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 60 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -filter_complex "[0:v]yadif=0:0:0,hwupload,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000 -y ".\1.7TWO.aac.standard.mp4"

Code: Select all

frame= 1500 fps=142 q=18.0 Lsize=   15010kB time=00:01:00.01 bitrate=2049.0kbits/s speed=5.66x 

2. yadif_cuda followed by unsharp_opencl
"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel warning -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 60 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,hwdownload,format=pix_fmts=yuv420p,hwupload,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000 -y ".\1.7TWO.aac.yadif_cuda.opencl.mp4"

Code: Select all

frame= 1500 fps=125 q=18.0 Lsize=   15000kB time=00:01:00.01 bitrate=2047.7kbits/s speed=5.01x    
I suppose it's the data copies to/from the GPU that do it in.

If only I was able to cross-compile an ffmpeg with vapoursynth inbuilt (no simple to follow step-by-step instructions) then using a single ffmpeg.exe would be "painless" and insanely fast with DG's latest gear.

Oh well.

Post Reply
'