概述

图像空间立体化，可以让视觉增强现实效果更出色。先来展示一下效果：

我们提出了一种算法，用于为单眼视频中的所有像素重建密集的、几何一致的深度。我们利用传统的运动结构重建来建立视频中像素的几何约束。与经典重建中的临时先验不同，我们使用基于学习的先验，即为单图像深度估计训练的卷积神经网络。在测试时，我们对该网络进行微调以满足特定输入视频的几何约束，同时保留其在受限较少的视频部分中合成合理深度细节的能力。我们通过定量验证表明，我们的方法比以前的单目重建方法具有更高的准确性和更高程度的几何一致性。从视觉上看，我们的结果显得更加稳定。我们的算法能够处理具有适度动态运动的具有挑战性的手持捕获输入视频。改进后的重建质量支持多种应用，例如场景重建和基于视频的高级视觉效果。

必要条件

拉取第三方包。

git submodule update --init --recursive

安装 python 包。

conda create -n consistent_depth python=3.6
conda activate consistent_depth
./scripts/install.sh

[hidecontent type="logged" desc="隐藏内容：登录后可查看"]

FFmpeg
按照https://colmap.github.io/install.html安装 COLMAP 。注意COLMAP >= 3.6需要排除动态对象上的提取特征。如果您使用的是 Ubuntu，则可以通过安装 COLMAP ./scripts/install_colmap_ubuntu.sh。

快速开始

您可以在不安装COLMAP 的情况下运行以下演示。在一个 NVIDIA GeForce RTX 2080 GPU 上测试时，演示需要 37 分钟。

下载模型和演示视频及其预计算的 COLMAP 结果。

./scripts/download_model.sh
./scripts/download_demo.sh results/ayush

跑步

python main.py --video_file data/videos/ayush.mp4 --path results/ayush \
  --camera_params "1671.770118, 540, 960" --camera_model "SIMPLE_PINHOLE" \
  --make_video

其中1671.770118, 540, 960是相机内在函数 ( f, cx, cy) ，SIMPLE_PINHOLE是相机模型。

您可以通过以下方式检查测试时间培训过程

tensorboard --logdir results/ayush/R_hierarchical2_mc/B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam/tensorboard/

您可以找到如下结果。

results/ayush/R_hierarchical2_mc
  videos/
    color_depth_mc_depth_colmap_dense_B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam.mp4    # comparison of disparity maps from mannequin challenge, COLMAP and ours
  B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam/
    depth/                      # final disparity maps
    checkpoints/0020.pth        # final checkpoint
    eval/                       # disparity maps and losses after each epoch of training

可以在此处找到预期的输出。由于测试时间训练过程中的随机性，您的结果可能会有所不同。

除了用于快速演示和易于安装的 COLMAP 部分之外，该演示运行所有内容，包括流量估计、测试时间训练等。要启用测试 COLMAP 部分，您可以删除results/ayush/colmap_dense和 results/ayush/depth_colmap_dense。然后再次运行上面的python命令。

定制运行：

请参考params.py或运行python main.py --help以获取完整的参数列表。在这里，我演示了一些系统常见用法的示例。

在您自己的视频上运行

将您的视频文件放在$video_file_path.
PINHOLE[可选] 使用(fx, fy, cx, cy) 或SIMPLE_PINHOLE(f, cx, cy) 模型校准相机。相机内在校准是可选的，但建议进行更准确和更快的相机注册。我们通常通过拍摄具有非常慢的相机运动的纹理平面视频来校准相机，同时尝试让目标特征覆盖整个视野，选择非模糊帧，在这些图像上运行COLMAP 。

跑步

在没有相机校准的情况下运行。

python main.py --video_file $video_file_path --path $output_path --make_video

运行相机校准。例如，运行PINHOLE模型和fx, fy, cx, cy = 1660.161322, 1600, 540, 960

python main.py --video_file $video_file_path --path $output_path \
  --camera_model "PINHOLE" --camera_params "1660.161322, 1600, 540, 960" \
  --make_video

您还可以通过以下方式指定后端单目深度估计网络
```
python main.py --video_file $video_file_path --path $output_path \
  --camera_model "PINHOLE" --camera_params "1660.161322, 1600, 540, 960" \
  --make_video --model_type "${model_type}"
```
支持的模型类型是mc（Zhang 等人的 Mannequin Challenge，2019 年）、midas2（Ranftl 等人的 MiDaS，2019 年）和monodepth2（Godard 等人的 Monodepth2，2019 年）。

使用预先计算的相机姿势运行

我们依靠COLMAP来进行相机姿势注册。如果您有预先计算的相机姿势，您可以将它们提供给文件夹中的系统，$path如下所示。（$path参见此处的示例文件结构。）

将您的彩色图像保存为color_full/frame_%06d.png.

创建frame.txt格式（示例参见此处）：

number_of_frames
width
height
frame_000000_timestamp_in_seconds
frame_000001_timestamp_in_seconds
...

在此之后将您的相机姿势转换为 COLMAP 稀疏重建格式。将您的images.txt,cameras.txt和points3D.txt（或.bin）放在下colmap_dense/pose_init/。请注意，POINTS2Dinimages.txt和 thepoints3D.txt可以为空。

跑步。

python main.py --path $path --initialize_pose

屏蔽动态对象以进行相机姿态估计

为了在动态场景中获得更好的姿势，您可以在使用COLMAP提取特征时屏蔽掉动态对象。注意COLMAP >= 3.6是提取屏蔽区域特征所必需的。

提取帧

python main.py --video_file $video_file_path --path $output_path --op extract_frames

在图像上运行您最喜欢的分割方法（例如，Mask-RCNN）$output_path/color_full以提取动态对象（例如，人类）的二进制掩码。在蒙版图像为黑色（灰度中像素强度值为 0）的区域中不会提取任何特征。在COLMAP 文件之后，保存帧的掩码$output_path/color_full/frame_000010.png，例如，在$output_path/mask/frame_000010.png.png。

运行管道的其余部分。

python main.py --path $output_path --mask_path $output_path/mask \
  --camera_model "${camera_model}" --camera_params "${camera_intrinsics}" \
  --make_video

结果文件夹结构

结果文件夹具有以下结构。许多文件仅用于调试目的。

frames.txt              # meta data about number of frames, image resolution and timestamps for each frame
color_full/             # extracted frames in the original resolution
color_down/             # extracted frames in the resolution for disparity estimation 
color_down_png/      
color_flow/             # extracted frames in the resolution for flow estimation
flow_list.json          # indices of frame pairs to finetune the model with
flow/                   # optical flow 
mask/                   # mask of consistent flow estimation between frame pairs.
vis_flow/               # optical flow visualization. Green regions contain inconsistent flow. 
vis_flow_warped/        # visualzing flow accuracy by warping one frame to another using the estimated flow. e.g., frame_000000_000032_warped.png warps frame_000032 to frame_000000.
colmap_dense/           # COLMAP results
    metadata.npz        # camera intrinsics and extrinsics converted from COLMAP sparse reconstruction.
    sparse/             # COLMAP sparse reconstruction
    dense/              # COLMAP dense reconstruction
depth_colmap_dense/     # COLMAP dense depth maps converted to disparity maps in .raw format
depth_${model_type}/    # initial disparity estimation using the original monocular depth model before test-time training
R_hierarchical2_${model_type}/ 
    flow_list_0.20.json                 # indices of frame pairs passing overlap ratio test of threshold 0.2. Same content as ../flow_list.json.
    metadata_scaled.npz                 # camera intrinsics and extrinsics after scale calibration. It is the camera parameters used in the test-time training process.
    scales.csv                          # frame indices and corresponding scales between initial monocular disparity estimation and COLMAP dense disparity maps.
    depth_scaled_by_colmap_dense/       # monocular disparity estimation scaled to match COLMAP disparity results
    vis_calibration_dense/              # for debugging scale calibration. frame_000000_warped_to_000029.png warps frame_000000 to frame_000029 by scaled camera translations and disparity maps from initial monocular depth estimation.
    videos/                             # video visualization of results 
    B0.1_R1.0_PL1-0_LR0.0004_BS4_Oadam/
        checkpoints/                    # checkpoint after each epoch
        depth/                          # final disparity map results after finishing test-time training
        eval/                           # intermediate losses and disparity maps after each epoch 
        tensorboard/                    # tensorboard log for the test-time training process

[/hidecontent]