R76 – Respecting Cache Limits

R76 completely reworks how the cache limits work and actually much more strictly adheres to them if it all possible. This means that scripts that previously were running just fine may slow down due to memory constraints. For example even with the default limit of 4GB scripts like mc_degrain could actually use 16GB+ of memory when processing 4k material. That doesn’t sound so bad until the Linux out of memory killer strikes and many scripts can’t complete at all.

This version improves things by limiting the number of concurrent threads running at the same time when memory is constrained. For example in the case of MVTools (super-analyze-degrain3) with 4k 10bit material an additional thread can mean approximately 500MB more memory is required. Previously all limiting would fail when there were no more frames stored in the caches and only the working set of the running filters remained. When this state is reached the number of running threads is reduced. Note that if you have a 16 core CPU that can run 32 hardware threads at once decreasing the number down to 8-16 actually running threads usually has very little real effect since consumer CPUs will dual channel ram most of the time will be memory bandwidth limited anyway.

However if you see it go down to 1-4 threads from 32 you should REALLY increase the maximum cache size.

Two serious bugs were also fixed, one caused corrupt output from the “generic filters” (maxium, minimum, 3×3 convolution and so on) in some compiles in the avx2 code path. The other bug is much more specific and could cause memory leaks if an API4 filter requested frames from a node not properly declared as a dependency. If that sounds very specific it’s just what MVTools v25&v26 did.