MediaPipe 系列 26：Face Detection——BlazeFace 架构深度解析与 IMS 集成

一、BlazeFace 设计背景

1.1 为什么需要轻量级人脸检测？

移动端/嵌入式场景的挑战：

挑战	传统方法问题
计算资源有限	ResNet/SSD 计算量过大
实时性要求	需要 30+ FPS
功耗敏感	GPU/NNPU 功耗预算有限
模型大小	移动端存储受限

BlazeFace 的设计目标：

┌─────────────────────────────────────────────────────────────┐
│                    BlazeFace 设计目标                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  性能目标                                                   │
│  ├── 速度：移动端 CPU 实时 (>30 FPS)                        │
│  ├── 精度：WIDER FACE Easy > 95% AP                         │
│  ├── 模型大小：< 1MB                                        │
│  └── 人脸范围：有效检测 20-200 像素人脸                      │
│                                                             │
│  设计原则                                                   │
│  ├── 轻量级 backbone（5 个 BlazeBlock）                     │
│  ├── 无需后处理 NMS（使用 anchor ensembling）               │
│  ├── 专为近场人脸优化（自拍、视频通话）                      │
│  └── 易于移植到各种推理引擎                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

1.2 与其他人脸检测器对比

模型	参数量	计算量 (MAdds)	FPS (手机)	精度 (Easy)
SSD-ResNet50	25M	2000+	5-10	93%
RetinaFace-MobileNet	5M	500+	15-20	94%
BlazeFace	0.5M	~100	30-60	95%

二、BlazeFace 架构详解

2.1 整体架构

┌─────────────────────────────────────────────────────────────────────────┐
│                    BlazeFace 完整架构                                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  输入层                                                                 │
│  ┌─────────────────────────────────────────────────────────┐           │
│  │                    128×128×3 RGB Image                   │           │
│  └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│  Backbone（特征提取）                                                   │
│  ┌─────────────────────────────────────────────────────────┐           │
│  │  Conv2D(24, 5×5, stride=2) + PReLU    → 64×64×24       │           │
│  │  BlazeBlock(24, 24, stride=1)          → 64×64×24       │           │
│  │  BlazeBlock(24, 48, stride=2)          → 32×32×48       │           │
│  │  BlazeBlock(48, 48, stride=1)          → 32×32×48       │           │
│  │  BlazeBlock(48, 48, stride=1)          → 32×32×48       │           │
│  │  BlazeBlock(48, 24, stride=2)          → 16×16×24       │           │
│  │  BlazeBlock(24, 24, stride=1)          → 16×16×24       │           │
│  │  BlazeBlock(24, 24, stride=1)          → 16×16×24       │           │
│  └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│  多尺度特征融合                                                         │
│  ┌─────────────────────────────────────────────────────────┐           │
│  │  特征层: 16×16×24, 8×8×24 (通过额外 BlazeBlock)         │           │
│  │                                                          │           │
│  │  Anchor 配置:                                            │           │
│  │  ├── 16×16 层: 2 anchors/cell, scale=8.0                │           │
│  │  └── 8×8 层:   6 anchors/cell, scale=16.0               │           │
│  │                                                          │           │
│  │  总 Anchor 数: 16×16×2 + 8×8×6 = 896                    │           │
│  └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│  输出头                                                                 │
│  ┌───────────────────┐          ┌───────────────────┐                  │
│  │  Bounding Box     │          │  Keypoints        │                  │
│  │  Head             │          │  Head             │                  │
│  │                   │          │                   │                  │
│  │  Conv2D(4 anchors)│          │  Conv2D(6 anchors)│                  │
│  │  → (x, y, w, h)   │          │  → (x0,y0,x1,y1,  │                  │
│  │                   │          │      x2,y2)       │                  │
│  └───────────────────┘          └───────────────────┘                  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.2 BlazeBlock 设计

核心创新：双分支结构

class BlazeBlock:
    """
    BlazeBlock: 轻量级残差块
    
    特点:
    1. 使用 5×5 depthwise conv 替代 3×3 depthwise
    2. 残差连接（当输入输出通道相同时）
    3. PReLU 激活（比 ReLU 更适合量化）
    """
    
    def __init__(self, in_channels, out_channels, stride=1):
        self.stride = stride
        self.use_residual = (stride == 1 and in_channels == out_channels)
        
        # 主分支
        self.conv1 = DepthwiseSeparableConv(
            in_channels, out_channels, 
            kernel_size=5, stride=stride
        )
        self.prelu1 = PReLU()
        
        self.conv2 = DepthwiseSeparableConv(
            out_channels, out_channels,
            kernel_size=5, stride=1
        )
        self.prelu2 = PReLU()
        
        # 残差分支
        if not self.use_residual and stride > 1:
            self.residual = nn.AvgPool2d(stride, stride)
        else:
            self.residual = None
    
    def forward(self, x):
        residual = x
        
        # 主分支
        x = self.conv1(x)
        x = self.prelu1(x)
        x = self.conv2(x)
        
        # 残差连接
        if self.use_residual:
            x = x + residual
        elif self.residual is not None:
            x = x + self.residual(residual)
        
        return x

2.3 Anchor 设计

BlazeFace 使用两种 Anchor 尺度：

┌─────────────────────────────────────────────────────────────┐
│                    Anchor 配置                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  层 1: 16×16 特征图                                         │
│  ├── Scale: 8.0 像素                                        │
│  ├── Aspect Ratio: 1.0 (正方形)                             │
│  ├── Anchors per cell: 2                                    │
│  └── 总数: 16 × 16 × 2 = 512                                │
│                                                             │
│  层 2: 8×8 特征图                                           │
│  ├── Scale: 16.0 像素                                       │
│  ├── Aspect Ratios: [1.0, 1.5, 2.0]                         │
│  ├── Anchors per cell: 6 (2 scales × 3 ratios)             │
│  └── 总数: 8 × 8 × 6 = 384                                  │
│                                                             │
│  总 Anchor 数: 512 + 384 = 896                              │
│                                                             │
│  检测范围:                                                  │
│  ├── 最小人脸: ~8 像素 (scale=8)                            │
│  └── 最大人脸: ~64 像素 (scale=16 × 4 = 64)                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2.4 关键点预测

BlazeFace 同时预测 6 个面部关键点：

关键点	位置	用途
0	右眼中心	人脸对齐、姿态估计
1	左眼中心	人脸对齐、姿态估计
2	鼻尖	人脸对齐、姿态估计
3	嘴巴中心	表情识别
4	右耳	人脸姿态估计
5	左耳	人脸姿态估计

三、MediaPipe Face Detection 集成

3.1 Graph 配置

# mediapipe/graphs/face_detection/face_detection_short_range.pbtxt

# 输入
input_stream: "IMAGE:image"

# 输出
output_stream: "DETECTIONS:detections"

# 1. 图像格式转换
node {
  calculator: "ImageTransformationCalculator"
  input_stream: "IMAGE:image"
  output_stream: "IMAGE:converted_image"
  options {
    [mediapipe.ImageTransformationCalculatorOptions.ext] {
      output_format: SRGB
    }
  }
}

# 2. 缩放到模型输入尺寸
node {
  calculator: "ImageTransformationCalculator"
  input_stream: "IMAGE:converted_image"
  output_stream: "IMAGE:resized_image"
  options {
    [mediapipe.ImageTransformationCalculatorOptions.ext] {
      output_width: 128
      output_height: 128
    }
  }
}

# 3. 转换为 Tensor
node {
  calculator: "ImageToTensorCalculator"
  input_stream: "IMAGE:resized_image"
  output_stream: "TENSORS:tensors"
  options {
    [mediapipe.ImageToTensorCalculatorOptions.ext] {
      tensor_width: 128
      tensor_height: 128
      tensor_channels: 3
      tensor_float_range {
        min: -1.0
        max: 1.0
      }
    }
  }
}

# 4. 模型推理
node {
  calculator: "InferenceCalculator"
  input_stream: "TENSORS:tensors"
  output_stream: "TENSORS:output_tensors"
  options {
    [mediapipe.InferenceCalculatorOptions.ext] {
      model_path: "/models/blazeface.tflite"
      delegate {
        tflite {
          max_delegated_partitions: 1
        }
      }
    }
  }
}

# 5. 后处理
node {
  calculator: "BlazeFacePostprocessorCalculator"
  input_stream: "TENSORS:output_tensors"
  input_stream: "ORIGINAL_IMAGE_SIZE:image_size"
  output_stream: "DETECTIONS:detections"
  options {
    [mediapipe.BlazeFaceOptions.ext] {
      score_threshold: 0.5
      min_suppression_threshold: 0.3
      num_keypoints: 6
      anchor_offset_x: 0.5
      anchor_offset_y: 0.5
      anchors {
        num_layers: 2
        strides: [16, 8]
        aspect_ratios: [1.0]
        min_scale: 0.125
        max_scale: 0.75
        input_size_height: 128
        input_size_width: 128
      }
    }
  }
}

3.2 后处理 Calculator

// mediapipe/calculators/tflite/blazeface_postprocessor.cc

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "tensorflow/lite/interpreter.h"

namespace mediapipe {

class BlazeFacePostprocessorCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc);
  
  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;

 private:
  // 生成 Anchor
  std::vector<std::pair<float, float>> GenerateAnchors();
  
  // 解码边界框
  std::vector<float> DecodeBox(
      const float* box_data, 
      const std::pair<float, float>& anchor);
  
  // 解码关键点
  std::vector<std::pair<float, float>> DecodeKeypoints(
      const float* keypoint_data,
      const std::pair<float, float>& anchor,
      const std::vector<float>& box);
  
  // 非极大值抑制
  std::vector<int> NonMaxSuppression(
      const std::vector<std::vector<float>>& boxes,
      const std::vector<float>& scores,
      float nms_threshold);
  
  // 配置
  float score_threshold_ = 0.5f;
  float min_suppression_threshold_ = 0.3f;
  int num_keypoints_ = 6;
  int num_anchors_ = 896;
  
  std::vector<std::pair<float, float>> anchors_;
};

absl::Status BlazeFacePostprocessorCalculator::Open(CalculatorContext* cc) {
  const auto& options = cc->Options<BlazeFaceOptions>();
  
  score_threshold_ = options.score_threshold();
  min_suppression_threshold_ = options.min_suppression_threshold();
  num_keypoints_ = options.num_keypoints();
  
  // 预生成 Anchor
  anchors_ = GenerateAnchors();
  
  return absl::OkStatus();
}

absl::Status BlazeFacePostprocessorCalculator::Process(CalculatorContext* cc) {
  if (cc->Inputs().Tag("TENSORS").IsEmpty()) {
    return absl::OkStatus();
  }
  
  const auto& tensors = cc->Inputs().Tag("TENSORS").Get<std::vector<TfLiteTensor>>();
  
  // 假设输出格式: [boxes, scores, keypoints]
  const float* box_data = tensors[0].data.f;
  const float* score_data = tensors[1].data.f;
  const float* keypoint_data = tensors[2].data.f;
  
  // 收集所有检测结果
  std::vector<std::vector<float>> all_boxes;
  std::vector<float> all_scores;
  
  for (int i = 0; i < num_anchors_; ++i) {
    float score = score_data[i];
    
    if (score < score_threshold_) {
      continue;
    }
    
    // 解码边界框
    auto box = DecodeBox(box_data + i * 16, anchors_[i]);
    
    all_boxes.push_back(box);
    all_scores.push_back(score);
  }
  
  // NMS
  auto keep_indices = NonMaxSuppression(
      all_boxes, all_scores, min_suppression_threshold_);
  
  // 构建输出 Detection
  auto detections = absl::make_unique<std::vector<Detection>>();
  
  for (int idx : keep_indices) {
    Detection detection;
    
    // 边界框
    auto* bbox = detection.mutable_location_data()->mutable_relative_bounding_box();
    bbox->set_xmin(all_boxes[idx][0]);
    bbox->set_ymin(all_boxes[idx][1]);
    bbox->set_width(all_boxes[idx][2] - all_boxes[idx][0]);
    bbox->set_height(all_boxes[idx][3] - all_boxes[idx][1]);
    
    // 分数
    detection.set_score(all_scores[idx]);
    detection.set_label_id(0);  // 人脸类别
    
    // 关键点
    auto keypoints = DecodeKeypoints(
        keypoint_data + idx * num_keypoints_ * 2,
        anchors_[idx],
        all_boxes[idx]);
    
    for (const auto& kp : keypoints) {
      auto* keypoint = detection.mutable_location_data()->add_relative_keypoints();
      keypoint->set_x(kp.first);
      keypoint->set_y(kp.second);
    }
    
    detections->push_back(detection);
  }
  
  cc->Outputs().Tag("DETECTIONS").Add(detections.release(), cc->InputTimestamp());
  
  return absl::OkStatus();
}

std::vector<float> BlazeFacePostprocessorCalculator::DecodeBox(
    const float* box_data,
    const std::pair<float, float>& anchor) {
  
  // box_data: [x_center, y_center, width, height, ...] (16 floats total)
  float x_center = box_data[0] / 128.0f + anchor.first;
  float y_center = box_data[1] / 128.0f + anchor.second;
  float width = box_data[2];
  float height = box_data[3];
  
  // 转换为 [xmin, ymin, xmax, ymax]
  float xmin = x_center - width / 2.0f;
  float ymin = y_center - height / 2.0f;
  float xmax = x_center + width / 2.0f;
  float ymax = y_center + height / 2.0f;
  
  return {xmin, ymin, xmax, ymax};
}

std::vector<int> BlazeFacePostprocessorCalculator::NonMaxSuppression(
    const std::vector<std::vector<float>>& boxes,
    const std::vector<float>& scores,
    float nms_threshold) {
  
  std::vector<int> indices;
  std::vector<bool> suppressed(boxes.size(), false);
  
  // 按分数排序的索引
  std::vector<int> order(scores.size());
  std::iota(order.begin(), order.end(), 0);
  std::sort(order.begin(), order.end(), 
            [&scores](int a, int b) { return scores[a] > scores[b]; });
  
  for (int i : order) {
    if (suppressed[i]) continue;
    
    indices.push_back(i);
    
    for (int j : order) {
      if (suppressed[j]) continue;
      
      // 计算 IoU
      float iou = CalculateIoU(boxes[i], boxes[j]);
      
      if (iou > nms_threshold) {
        suppressed[j] = true;
      }
    }
  }
  
  return indices;
}

REGISTER_CALCULATOR(BlazeFacePostprocessorCalculator);

}  // namespace mediapipe

四、IMS DMS 集成实战

4.1 人脸检测在 DMS 中的应用

┌─────────────────────────────────────────────────────────────────────────┐
│                    DMS 人脸检测应用流程                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  输入: IR Camera (640×480)                                              │
│         │                                                               │
│         ▼                                                               │
│  ┌─────────────┐                                                       │
│  │ Face        │  检测人脸位置                                          │
│  │ Detection   │  - 驾驶员是否存在                                      │
│  └─────────────┘  - 多人场景识别                                        │
│         │                                                               │
│         ▼                                                               │
│  ┌─────────────┐                                                       │
│  │ Face        │  裁剪人脸区域                                          │
│  │ Crop        │  供后续处理                                            │
│  └─────────────┘                                                       │
│         │                                                               │
│         ├──────────────┬──────────────┬──────────────┐                 │
│         ▼              ▼              ▼              ▼                 │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐          │
│  │ Face Mesh │  │ Eye State │  │ Head Pose │  │ Identity  │          │
│  │ (468点)   │  │ Analysis  │  │ Estimation│  │ (可选)    │          │
│  └───────────┘  └───────────┘  └───────────┘  └───────────┘          │
│                                                                         │
│  应用场景:                                                              │
│  ├── 驾驶员在位检测                                                     │
│  ├── 疲劳检测预处理                                                     │
│  ├── 分心检测预处理                                                     │
│  └── 身份识别（可选）                                                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

4.2 完整 DMS 人脸检测 Graph

# mediapipe/graphs/ims/dms_face_detection_graph.pbtxt

input_stream: "IR_IMAGE:ir_image"
output_stream: "DETECTIONS:detections"
output_stream: "CROPPED_FACE:cropped_face"

# 1. 人脸检测
node {
  calculator: "FaceDetectionShortRangeGpu"
  input_stream: "IMAGE:ir_image"
  output_stream: "DETECTIONS:raw_detections"
}

# 2. 选择主驾驶员人脸（最大的）
node {
  calculator: "PrimaryFaceSelectorCalculator"
  input_stream: "DETECTIONS:raw_detections"
  output_stream: "DETECTION:primary_detection"
  options {
    [mediapipe.PrimaryFaceSelectorOptions.ext] {
      selection_strategy: LARGEST
    }
  }
}

# 3. 扩展边界框（为 Face Mesh 留余量）
node {
  calculator: "BoundingBoxExpanderCalculator"
  input_stream: "DETECTION:primary_detection"
  output_stream: "DETECTION:expanded_detection"
  options {
    [mediapipe.BoundingBoxExpanderOptions.ext] {
      scale_x: 1.5  # 水平扩展 50%
      scale_y: 1.5  # 垂直扩展 50%
    }
  }
}

# 4. 裁剪人脸区域
node {
  calculator: "ImageCropperCalculator"
  input_stream: "IMAGE:ir_image"
  input_stream: "DETECTION:expanded_detection"
  output_stream: "IMAGE:cropped_face"
}

# 输出
node {
  calculator: "DetectionToNotificationCalculator"
  input_stream: "DETECTION:primary_detection"
  output_stream: "DETECTIONS:detections"
}

4.3 性能优化

针对嵌入式平台的优化策略：

// 优化 1: 使用 NNAPI 加速
node {
  calculator: "InferenceCalculator"
  options {
    [mediapipe.InferenceCalculatorOptions.ext] {
      model_path: "/models/blazeface.tflite"
      delegate {
        xnnpack {
          num_threads: 4
        }
      }
    }
  }
}

// 优化 2: 降采样输入图像
node {
  calculator: "ImageTransformationCalculator"
  options {
    [mediapipe.ImageTransformationCalculatorOptions.ext] {
      output_width: 320
      output_height: 240
      scale_mode: FIT
    }
  }
}

// 优化 3: 量化模型
// 使用 TFLite Converter 量化
// tflite_convert --quantize --output_file blazeface_quant.tflite

五、调试与测试

5.1 可视化调试

# 使用 MediaPipe Python API 可视化
import mediapipe as mp
import cv2

mp_face_detection = mp.solutions.face_detection

cap = cv2.VideoCapture(0)

with mp_face_detection.FaceDetection(
    model_selection=0,  # 0: short-range, 1: full-range
    min_detection_confidence=0.5
) as face_detection:
  
  while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
      break
    
    # 转换颜色空间
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # 检测
    results = face_detection.process(image)
    
    # 绘制结果
    if results.detections:
      for detection in results.detections:
        # 绘制边界框
        bboxC = detection.location_data.relative_bounding_box
        h, w, c = frame.shape
        x1 = int(bboxC.xmin * w)
        y1 = int(bboxC.ymin * h)
        x2 = int((bboxC.xmin + bboxC.width) * w)
        y2 = int((bboxC.ymin + bboxC.height) * h)
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        
        # 绘制关键点
        for kp in detection.location_data.relative_keypoints:
          kpx = int(kp.x * w)
          kpy = int(kp.y * h)
          cv2.circle(frame, (kpx, kpy), 3, (0, 0, 255), -1)
        
        # 显示分数
        cv2.putText(frame, f'{detection.score:.2f}', (x1, y1 - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)
    
    cv2.imshow('Face Detection', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
      break

cap.release()
cv2.destroyAllWindows()

5.2 性能测试

# 在目标设备上测试 FPS
adb shell /data/local/tmp/mediapipe_cpu --calculator_graph_config_path=/sdcard/face_detection.pbtxt --input_stream_path=/sdcard/input_video.mp4 --output_stream_path=/sdcard/output_video.mp4

# 性能分析
adb logcat | grep -E "(FPS|Latency|Memory)"

六、总结

要点	说明
BlazeBlock	轻量级残差块，5×5 depthwise conv
Anchor	896 个 anchor，覆盖 8-64 像素人脸
Keypoints	同时预测 6 个面部关键点
IMS 集成	人脸检测 + 裁剪 + 后续处理

系列进度： 26/55
更新时间： 2026-03-12

MediaPipe 系列 > 内置 Solution

#DMS #IMS #MediaPipe #Face Detection #BlazeFace

MediaPipe 系列 26：Face Detection——BlazeFace 架构深度解析与 IMS 集成

https://dapalm.com/2026/03/12/MediaPipe系列26-Face-Detection：BlazeFace架构解析/

作者

Mars

发布于

2026年3月12日

许可协议

MediaPipe 系列 50：IMS 数据融合——多传感器协同流水线上一篇

MediaPipe 系列 44：IMS DMS 架构——头部姿态估计完整实现下一篇