Core capability

Depth Estimation

Per-pixel depth from any 2D frame.

Model architectures

Native inference resolution

~0.7s

Per frame at 1080p (RTX 3060)

Technical overview

Depth estimation uses monocular depth inference — a single 2D image goes in, a full depth map comes out. anelo supports multiple model architectures (Depth Anything V2, MiDaS, ZoeDepth) so you can trade speed for precision depending on the source material.

Temporal smoothing reduces flicker between consecutive frames, which is critical for video — a per-frame model that produces slightly different depth maps on adjacent frames creates visible jitter in the stereo output. anelo applies configurable smoothing that preserves depth transitions at scene cuts while enforcing consistency within shots.

Inference runs at native resolution. There is no downscale-then-upscale trick that degrades edge quality. A 4K frame produces a 4K depth map. Combined with the preflight scoring stage, frames with excessive motion blur or occlusion are flagged before depth estimation runs, so you do not waste GPU time on frames that will produce unreliable results.

Use cases

Convert 2D video to stereoscopic 3D for VR headsets
Generate depth maps for parallax scrolling or 3D photo effects
Feed downstream stereo conversion with high-quality depth data
Analyze scene geometry for visual effects compositing
Create depth-aware color grading or focus effects

Configuration

Modelselect
Depth inference model architecture. Each has different strengths for different content types.
Default: Depth Anything V2
Temporal smoothingslider
Reduces frame-to-frame flicker in depth maps. Higher values produce more consistent depth but may blur fast motion.
Default: 0.5
Strengthslider
Overall depth intensity multiplier. Controls how pronounced the depth separation appears in downstream stereo output.
Default: 1.0
Edge refinementtoggle
Sharpens depth transitions at object boundaries. Reduces halo artifacts in stereo output at the cost of slightly longer processing.
Default: On

Pipeline stages

05Depth Estimation

Generates a per-pixel depth map for every frame. This is the core of the 3D conversion — the depth map determines how far each pixel is from the virtual camera, enabling stereo synthesis.

Temporal smoothing reduces flicker between frames. Native resolution inference.

Depth Anything V2MiDaSZoeDepth

Available models

Depth Anything V2

MiDaS

ZoeDepth

Output formats

Grayscale depth mapPNG

Per-frame depth as 16-bit grayscale. Higher pixel values mean closer to camera.

Colorized depth mapPNG

Turbo or Inferno colormap for visualization and QA review. Not used for stereo conversion.

Workflows

Batch depth map generation

Point anelo at a folder of clips. Each file runs through scene analysis, frame extraction, and depth estimation. Output depth maps land in a mirrored folder structure, ready for downstream tools or stereo conversion.

Model comparison on a single clip

Run the same source through Depth Anything V2, MiDaS, and ZoeDepth to compare output. Each model handles different content differently — Depth Anything V2 excels at natural scenes, ZoeDepth produces metrically accurate results for indoor environments.

Depth-to-stereo pipeline

The most common workflow: depth estimation feeds directly into stereo warping and compositing. One click runs the full pipeline — depth map generation, left/right eye synthesis, and final SBS or MV-HEVC output.

Stereo Conversion

Supporting capability

Quality Control

Industries using this

Film & TV Production VR & Spatial Computing

Who uses this

Prosumers & Hobbyists Indie Filmmakers & Documentary Teams

Start with a free desktop install.

Desktop processing is free with no job limits. Pro adds cloud processing, watermark removal, and advanced 3D controls.

Start free View pricing