Depth Estimation
Per-pixel depth from any 2D frame.
3
Model architectures
4K
Native inference resolution
~0.7s
Per frame at 1080p (RTX 3060)
Technical overview
Depth estimation uses monocular depth inference — a single 2D image goes in, a full depth map comes out. anelo supports multiple model architectures (Depth Anything V2, MiDaS, ZoeDepth) so you can trade speed for precision depending on the source material.
Temporal smoothing reduces flicker between consecutive frames, which is critical for video — a per-frame model that produces slightly different depth maps on adjacent frames creates visible jitter in the stereo output. anelo applies configurable smoothing that preserves depth transitions at scene cuts while enforcing consistency within shots.
Inference runs at native resolution. There is no downscale-then-upscale trick that degrades edge quality. A 4K frame produces a 4K depth map. Combined with the preflight scoring stage, frames with excessive motion blur or occlusion are flagged before depth estimation runs, so you do not waste GPU time on frames that will produce unreliable results.
Use cases
- Convert 2D video to stereoscopic 3D for VR headsets
- Generate depth maps for parallax scrolling or 3D photo effects
- Feed downstream stereo conversion with high-quality depth data
- Analyze scene geometry for visual effects compositing
- Create depth-aware color grading or focus effects
Configuration
- Modelselect
Depth inference model architecture. Each has different strengths for different content types.
Default: Depth Anything V2
- Temporal smoothingslider
Reduces frame-to-frame flicker in depth maps. Higher values produce more consistent depth but may blur fast motion.
Default: 0.5
- Strengthslider
Overall depth intensity multiplier. Controls how pronounced the depth separation appears in downstream stereo output.
Default: 1.0
- Edge refinementtoggle
Sharpens depth transitions at object boundaries. Reduces halo artifacts in stereo output at the cost of slightly longer processing.
Default: On
Pipeline stages
Generates a per-pixel depth map for every frame. This is the core of the 3D conversion — the depth map determines how far each pixel is from the virtual camera, enabling stereo synthesis.
Temporal smoothing reduces flicker between frames. Native resolution inference.
Available models
Output formats
Per-frame depth as 16-bit grayscale. Higher pixel values mean closer to camera.
Turbo or Inferno colormap for visualization and QA review. Not used for stereo conversion.
Workflows
Batch depth map generation
Point anelo at a folder of clips. Each file runs through scene analysis, frame extraction, and depth estimation. Output depth maps land in a mirrored folder structure, ready for downstream tools or stereo conversion.
Model comparison on a single clip
Run the same source through Depth Anything V2, MiDaS, and ZoeDepth to compare output. Each model handles different content differently — Depth Anything V2 excels at natural scenes, ZoeDepth produces metrically accurate results for indoor environments.
Depth-to-stereo pipeline
The most common workflow: depth estimation feeds directly into stereo warping and compositing. One click runs the full pipeline — depth map generation, left/right eye synthesis, and final SBS or MV-HEVC output.
Industries using this
Start with a free desktop install.
Desktop processing is free with no job limits. Pro adds cloud processing, watermark removal, and advanced 3D controls.