The Uncanny Valley of AI Upscaling: When Too Sharp Feels Wrong
AI upscaling can over-sharpen footage until it stops looking like the original. Understanding the perceptual boundary prevents the most common mistake.
There is a moment in every upscaling project where the output crosses from "improved" to "uncanny." The image is sharper, the details are crisper, the resolution is higher — and yet something feels wrong. Skin looks like plastic. Fabric looks like it was rendered in a video game. Film grain disappears entirely, replaced by a clinical smoothness that no camera ever produced.
This is the uncanny valley of AI upscaling, and it happens because the model is doing exactly what it was trained to do: maximize perceptual sharpness. The training objective — produce outputs that look sharp and detailed to a discriminator network — does not include "preserve the original material's character." A 1970s documentary and a modern smartphone video both get the same sharpening treatment, regardless of whether sharpness is appropriate for the content.
The perceptual boundary varies by content type. For animation and game captures, aggressive upscaling almost always looks good — these are synthetic images that benefit from crispness. For cinematic content shot on film, the boundary is much lower. Film grain, lens softness, and subtle focus rolls are part of the aesthetic. Removing them does not improve the image; it erases the photographer's intent.
The practical fix is the denoise strength parameter. Most AI upscaling models (including Real-ESRGAN) have a configurable denoise level that controls how aggressively the model removes noise and grain. At low denoise settings, the model adds detail while preserving the original texture. At high settings, it strips everything and regenerates clean surfaces. The default is usually somewhere in the middle, which is a reasonable starting point but rarely optimal for any specific source.
For archival footage — family home videos, old concert recordings, historical documentaries — start with a low denoise setting and increase it only if the source is genuinely damaged (heavy compression artifacts, visible noise). The goal is to improve resolution while keeping the footage recognizable as something that was actually filmed, not something that was generated.
For modern high-quality sources that just need a resolution bump (1080p to 4K for a larger display), you may not need AI upscaling at all. A good Lanczos or bicubic resize produces perfectly acceptable results for clean, sharp sources. AI upscaling is most valuable when there is genuinely lost detail to recover — compressed footage, low-light captures, older cameras with limited resolving power. If the source is already sharp at its native resolution, the AI model has nothing to recover and will instead hallucinate detail that does not belong.