Upscaling Before or After Depth: The Pipeline Order Debate
Should you upscale your footage before or after estimating depth? The answer depends on what you are optimizing for, and it matters more than you think.
Pipeline order is one of those decisions that seems trivial until you see the results. In a typical media processing chain, you might upscale footage from 1080p to 4K, estimate depth, generate stereo, and encode. But should upscaling happen first — giving the depth model more pixels to work with — or last, after all the AI processing is done at the source resolution?
The argument for upscaling first is intuitive. More pixels means more detail for the depth model to analyze. A 4K frame has four times the spatial information of a 1080p frame. Depth models should, in theory, produce better results with more input detail — finer edges, more accurate object boundaries, fewer ambiguous regions.
The argument for upscaling last is computational. Running depth estimation at 4K instead of 1080p takes roughly 4x the GPU memory and 3-4x the processing time. If you are converting a feature film with thousands of frames, this difference compounds into days of additional processing. And the depth model's internal resolution is often lower than the input anyway — most models downsample to 384x384 or 512x512 internally before upsampling their output. Feeding them 4K input just means they downsample more aggressively.
Our testing shows the practical answer is nuanced. For depth estimation specifically, upscaling first produces marginally better results — perhaps 2-5% improvement in edge accuracy — because the additional detail helps the model resolve ambiguous boundaries. But the improvement vanishes if the source footage is already sharp at its native resolution. Upscaling a clean 1080p source to 4K before depth estimation is helpful; upscaling a noisy or compressed 720p source to 4K before depth estimation can actually hurt, because the upscaler hallucinates detail that confuses the depth model.
For frame interpolation, the order matters more. Interpolating at 1080p and then upscaling produces cleaner results than upscaling first and then interpolating, because the interpolation model has fewer pixels to track and is less likely to produce artifacts. Upscaling the interpolated result takes advantage of the AI upscaler's ability to add coherent detail to already-smooth motion.
The default pipeline order we settled on — extract, upscale, interpolate, depth, stereo, encode — represents our best compromise across content types. But the right answer for your specific footage might be different, which is why the pipeline is reorderable. If you know your source is clean 1080p, move upscaling before depth. If your source is noisy, keep the default order and let the upscaler clean it first.