DocFace：证件照人脸识别系统

概述

该存储库包括DocFace和DocFace+的 TensorFlow 实现，这是一个用于匹配 ID 照片和实时面部照片的系统。DocFace 被证明在 ID-Selfie 匹配问题上明显优于一般人脸匹配器。我们在这里给出了论文中的示例训练代码和预训练模型。对于预处理部分，我们按照SphereFace的存储库使用MTCNN对齐人脸图像。用户还可以使用其他方法进行面部对齐。由于论文中使用的数据集是私有的，我们不能在这里发布。人们可以在自己的数据集上测试系统。

引用

[hidecontent type="logged" desc="隐藏内容：登录后可查看"]

@article{shi2018docface+,
  title = {DocFace+: ID Document to Selfie Matching},
  author = {Shi, Yichun and Jain, Anil K.},
  booktitle = {arXiv:1809.05620},
  year = {2018}
}
@article{shi2018docface,
  title = {DocFace: Matching ID Document Photos to Selfies},
  author = {Shi, Yichun and Jain, Anil K.},
  booktitle = {arXiv:1805.02283},
  year = {2018}
}

要求

要求Python3
或更新版本的要求Tensorflow r1.2。
运行pip install -r requirements.txt其他依赖项。
Matlab 2014bMTCNN 人脸对齐的要求Caffe。

用法

第 1 部分：预处理

1.1 数据集结构

下载用于训练和测试基本模型的Ms-Celeb-1M和LFW数据集。CASIA-Webface 等其他数据集也可用于训练。因为已知 Ms-Celeb-1M 是一个非常嘈杂的数据集，所以我们使用Wu 等人提供的干净列表。将 Ms-Celeb-1M 数据集和 LFW 数据集排列成如下结构，其中每个子文件夹代表一个主题：

Aaron_Eckhart
    Aaron_Eckhart_0001.jpg
Aaron_Guiel
    Aaron_Guiel_0001.jpg
Aaron_Patterson
    Aaron_Patterson_0001.jpg
Aaron_Peirsol
    Aaron_Peirsol_0001.jpg
    Aaron_Peirsol_0002.jpg
    Aaron_Peirsol_0003.jpg
    Aaron_Peirsol_0004.jpg
...

对于 ID-Selfie 数据集，确保所有文件夹都在这样的结构中，其中 ID 图像和自拍分别以“A”和“B”开头：

Subject1
    A001.jpg
    B001.jpg
    B002.jpg
Subject2
    A001.jpg
    B001.jpg
...

1.2 人脸对齐

为了确保性能，我们在SphereFace之后使用原始 MATLAB 版本的 MTCNN 对齐所有面部图像。在这里，我们提供了一个更简单的代码来对齐给定的数据集文件夹。要使用代码，您需要安装Caffe for Matlab，并克隆 MTCNN和Pdolloar的 repo 。然后在以下几行中填写它们的路径align/face_detect_align.m：

imglist = importdata('/path/to/input/imagelist.txt');
output_dir = '/path/to/output/dataset';
...
matCaffe       = '/path/to/caffe/matlab/';
pdollarToolbox = '/path/to/toolbox';
MTCNN          = '/path/to/mtcnn/code/codes/MTCNNv1';
...
modelPath = '/path/to/mtcnn/code/codes/MTCNNv1/model';

在 Matlab 中运行以下命令进行人脸对齐：

run align/face_detect_align.m

第 2 部分：培训

注意：在这部分，我们假设你在目录中$DOCFACE_ROOT/

2.1 训练基础模型

在以下位置设置数据集路径config/basemodel.py：

# Training dataset path
train_dataset_path = '/path/to/msceleb1m/dataset/folder'

# Testing dataset path
test_dataset_path = '/path/to/lfw/dataset/folder'

由于内存成本，用户可能需要多个 GPU 才能使用256Ms-Celeb-1M 上的批量大小。特别是，我们使用了四个 GTX 1080 Ti GPU。在这种情况下，更改以下条目config/basemodel.py：
```
# Number of GPUs
num_gpus = 1
```
在终端中运行以下命令：
```
python src/train_base.py config/basemodel.py
```
训练完成后，下面会出现一个模型文件夹log/faceres_ms/。我们将使用它进行微调。如果多次运行训练代码，将出现多个以时间戳为名称的文件夹。用户也可以跳过这部分，使用我们提供的预训练基础模型。

2.2 ID-Selfie 数据集微调

在中设置数据集路径和预训练模型路径config/finetune.py

# Training dataset path
train_dataset_path = '/path/to/training/dataset/folder'

# Testing dataset path
test_dataset_path = '/path/to/testing/dataset/folder'

...

# The model folder from which to retore the parameters
restore_model = '/path/to/the/pretrained/model/folder'

根据您的数据集调整损失函数的参数config/finetune.py，例如
```
# Loss functions and their parameters.
losses = {
    'diam': {'scale': 'auto', 'm': 5.0, 'alpha':1.0}
}
```
在我们的实验中，我们发现没有必要手动选择“scale”。但在某些情况下，将“比例”更改为固定值可能会有所帮助。当每个类的平均样本数较大时，应倾向于使用较小的“alpha”。

在终端中运行以下命令开始微调：

python src/train_sibling.py config/finetune.py

第 3 部分：特征提取

注意：在这部分，我们假设你在目录中$DOCFACE_ROOT/

要使用预训练模型（基础网络或兄弟网络）提取特征，请准备.txt图像列表文件。图像应以与训练数据集相同的方式对齐。然后在终端中运行以下命令：

python src/extract_features.py \
--model_dir /path/to/pretrained/model/dir \
--image_list /path/to/imagelist.txt \
--output /path/to/output.npy

请注意，图像列表中的图像遵循与训练数据集相同的命名约定。即证件照以“A**”开头，自拍照以“B**”开头。

楷模

BaseModel（无约束人脸匹配）：Google Drive | 百度云
微调 DocFace 模型：（联系作者）

结果

使用我们预训练的基础模型，应该能够在标准 LFW 验证协议上达到 99.67%，在BLUFR协议上达到 99.60%。使用我们的代码在 Ms-Celeb-1M 上训练 Face-ResNet 应该可以得到类似的结果。
使用建议的 Max-margin Pairwise Score 损失和兄弟网络，DocFace 在迁移学习后与我们的私有 ID-Selfie 数据集上的 Base Model 相比实现了显着改进：
DIAM-Softmax 和 DocFace+ 在 ID-Selfie-A、ID-Selifie-B 和另一个更大的数据集的组合上的结果，其中大多数类只有两个图像（一对 ID 和自拍）：

许可证

麻省理工学院许可证

[/hidecontent]