MediaPipe 系列 27：Face Mesh——468 点人脸关键点完整指南

前言：为什么需要 Face Mesh？

27.1 Face Mesh 的重要性

精确的人脸关键点是 DMS/OMS 的基础：

┌─────────────────────────────────────────────────────────────────────────┐
│                    Face Mesh 的重要性                                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   问题：如何精确获取人脸关键点？                                        │
│                                                                         │
│   ┌─────────────────────────────────────────────────────────┐          │
│   │  IMS DMS 需要精确的人脸关键点：                           │          │
│   │                                                         │          │
│   │   • 疲劳检测：眼睛开合度（EAR）计算                       │          │
│   │   • 分心检测：视线方向估计                                 │          │
│   │   • 打哈欠检测：嘴巴开合度                                 │          │
│   │   • 头部姿态：yaw/pitch/roll 估计                        │          │
│   │   • 身份识别：人脸特征提取                                 │          │
│   │                                                         │          │
│   └─────────────────────────────────────────────────────────┘          │
│                                                                         │
│   Face Mesh 提供：                                                     │
│   ┌─────────────────────────────────────────────────────────┐          │
│   │                                                         │          │
│   │   • 468 个 3D 关键点                                     │          │
│   │   • 像素级精度（误差 < 5%）                               │          │
│   │   • 实时性能（~3ms GPU）                                  │          │
│   │   • 轻量级模型（~2.8MB）                                  │          │
│   │                                                         │          │
│   └─────────────────────────────────────────────────────────┘          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

27.2 功能特点

特性	说明
关键点数量	468 个 3D 点
坐标系	归一化坐标 (0-1) + Z 深度
精度	像素级误差 < 5%
速度	~3ms (GPU), ~10ms (CPU)
模型大小	~2.8MB (TFLite)

27.3 应用场景

应用	使用的关键点	用途
疲劳检测	眼睛区域（33+33 点）	EAR 计算、眨眼频率
分心检测	眼睛 + 头部	视线方向、头部姿态
打哈欠检测	嘴巴区域（40 点）	嘴巴开合度
表情识别	全部 468 点	情绪分析
身份识别	关键特征点	人脸特征向量

二十八、架构详解

28.1 完整 Pipeline

┌─────────────────────────────────────────────────────────────────────────┐
│                    Face Mesh Pipeline                                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   输入层                                                                 │
│   ┌─────────────────────────────────────────────────────────┐           │
│   │                    Input Image                           │           │
│   │                    (任意尺寸 RGB)                         │           │
│   └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│   第一阶段：人脸检测                                                     │
│   ┌─────────────────────────────────────────────────────────┐           │
│   │                                                         │           │
│   │   BlazeFace Short Range                                │           │
│   │   • 输入：128×128 RGB                                   │           │
│   │   • 输出：人脸边界框 + 6 关键点                          │           │
│   │   • 速度：~1ms (GPU)                                    │           │
│   │                                                         │           │
│   └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│   第二阶段：人脸对齐                                                     │
│   ┌─────────────────────────────────────────────────────────┐           │
│   │                                                         │           │
│   │   Face Alignment                                       │           │
│   │   • 使用 6 关键点进行对齐                               │           │
│   │   • 裁剪人脸区域                                         │           │
│   │   • 调整到 192×192 输入尺寸                             │           │
│   │                                                         │           │
│   └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│   第三阶段：关键点检测                                                   │
│   ┌─────────────────────────────────────────────────────────┐           │
│   │                                                         │           │
│   │   Face Landmark Model                                  │           │
│   │   • 输入：192×192 RGB                                   │           │
│   │   • 模型：TFLite (~2.8MB)                               │           │
│   │   • 输出：468 × 3 (x, y, z)                             │           │
│   │   • 速度：~2ms (GPU)                                    │           │
│   │                                                         │           │
│   └─────────────────────────────────────────────────────────┘           │
│                              │                                          │
│                              ▼                                          │
│   输出层                                                                 │
│   ┌─────────────────────────────────────────────────────────┐           │
│   │                                                         │           │
│   │   468 个 3D 关键点                                      │           │
│   │   • x, y: 归一化坐标 (0-1)                              │           │
│   │   • z: 相对深度（非真实深度）                           │           │
│   │                                                         │           │
│   └─────────────────────────────────────────────────────────┘           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

28.2 Graph 配置

# ========== Face Mesh Graph 配置 ==========

# mediapipe/graphs/face_mesh/face_mesh_desktop_live.pbtxt

input_stream: "INPUT:Input"

output_stream: "LANDMARKS:multi_face_landmarks"
output_stream: "FACE_RECTS:face_rects"

# ========== 1. 图像格式转换 ==========
node {
  calculator: "ImageTransformationCalculator"
  input_stream: "INPUT:Input"
  output_stream: "IMAGE:converted_image"
  options {
    [mediapipe.ImageTransformationCalculatorOptions.ext] {
      output_format: SRGB
    }
  }
}

# ========== 2. 人脸检测 ==========
node {
  calculator: "FaceDetectionShortRangeGpu"
  input_stream: "IMAGE:converted_image"
  output_stream: "DETECTIONS:detections"
}

# ========== 3. 人脸对齐与裁剪 ==========
node {
  calculator: "FaceGeometryFromDetectionCalculator"
  input_stream: "DETECTIONS:detections"
  output_stream: "FACE_GEOMETRY:face_geometry"
  input_stream: "IMAGE_SIZE:image_size"
}

node {
  calculator: "ImageTransformationCalculator"
  input_stream: "IMAGE:converted_image"
  input_stream: "FACE_GEOMETRY:face_geometry"
  output_stream: "IMAGE:aligned_face"
  options {
    [mediapipe.ImageTransformationCalculatorOptions.ext] {
      output_width: 192
      output_height: 192
      scale_mode: FIT
    }
  }
}

# ========== 4. 关键点检测 ==========
node {
  calculator: "TfLiteInferenceCalculator"
  input_stream: "IMAGE:aligned_face"
  output_stream: "TENSORS:landmark_tensors"
  options {
    [mediapipe.TfLiteInferenceCalculatorOptions.ext] {
      model_path: "/models/face_landmark.tflite"
      delegate {
        gpu {
          use_advanced_gpu_api: true
        }
      }
    }
  }
}

# ========== 5. 后处理 ==========
node {
  calculator: "FaceLandmarksFromTensorCalculator"
  input_stream: "TENSORS:landmark_tensors"
  input_stream: "FACE_GEOMETRY:face_geometry"
  output_stream: "LANDMARKS:multi_face_landmarks"
  options {
    [mediapipe.FaceLandmarksFromTensorCalculatorOptions.ext] {
      num_landmarks: 468
    }
  }
}

二十九、关键点布局

29.1 面部区域划分

┌─────────────────────────────────────────────────────────────┐
│                    468 关键点区域划分                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   眼睛区域                                                   │
│   ┌─────────────────────────────────────────────┐              │
│   │   左眼 (Left Eye):                           │              │
│   │   • 轮廓点: 33 个 (索引 33-133)              │              │
│   │   • 虹膜点: 10 个 (索引 468-477)             │              │
│   │                                             │              │
│   │   右眼 (Right Eye):                          │              │
│   │   • 轮廓点: 33 个 (索引 263-363)             │              │
│   │   • 虹膜点: 10 个 (索引 478-487)             │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   眉毛区域                                                   │
│   ┌─────────────────────────────────────────────┐              │
│   │   左眉: 10 点 (索引 277-286)                 │              │
│   │   右眉: 10 点 (索引 287-296)                 │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   嘴巴区域                                                   │
│   ┌─────────────────────────────────────────────┐              │
│   │   外唇: 12 点 (索引 61-72)                   │              │
│   │   内唇: 8 点 (索引 78-85)                    │              │
│   │   总计: 40 点 (包括舌头等)                   │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   鼻子区域                                                   │
│   ┌─────────────────────────────────────────────┐              │
│   │   鼻梁: 4 点 (索引 6-9)                      │              │
│   │   鼻尖: 1 点 (索引 1)                        │              │
│   │   鼻翼: 各 5 点                              │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
│   脸部轮廓                                                   │
│   ┌─────────────────────────────────────────────┐              │
│   │   下巴: 17 点 (索引 152-176)                 │              │
│   │   左脸颊: 10 点                              │              │
│   │   右脸颊: 10 点                              │              │
│   └─────────────────────────────────────────────┘              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

29.2 关键点索引详解

// ========== 关键点索引定义 ==========
// mediapipe/modules/face_geometry/protos/face_geometry.proto

namespace face_mesh {
  // ========== 眼睛区域 ==========
  // 左眼轮廓（顺时针）
  constexpr int LEFT_EYE[] = {33, 133, 160, 158, 144, 145, 153, 154, 155, 133};
  
  // 右眼轮廓（顺时针）
  constexpr int RIGHT_EYE[] = {362, 263, 387, 385, 373, 374, 380, 381, 382, 263};
  
  // 左眼虹膜中心
  constexpr int LEFT_IRIS_CENTER = 468;
  
  // 右眼虹膜中心
  constexpr int RIGHT_IRIS_CENTER = 473;
  
  // ========== 眉毛区域 ==========
  constexpr int LEFT_EYEBROW[] = {276, 283, 282, 295, 285};
  constexpr int RIGHT_EYEBROW[] = {46, 53, 52, 65, 55};
  
  // ========== 嘴巴区域 ==========
  constexpr int UPPER_LIP[] = {61, 185, 40, 39, 37, 0, 267, 269, 270, 409, 291};
  constexpr int LOWER_LIP[] = {146, 91, 181, 84, 17, 314, 405, 321, 375, 291};
  
  // ========== 鼻子区域 ==========
  constexpr int NOSE_TIP = 1;
  constexpr int NOSE_BOTTOM = 2;
  constexpr int NOSE_BRIDGE[] = {6, 197, 195, 5};
  
  // ========== 头部姿态关键点 ==========
  constexpr int FOREHEAD = 10;       // 额头中心
  constexpr int CHIN = 152;         // 下巴中心
  constexpr int LEFT_TEMPLE = 234;  // 左太阳穴
  constexpr int RIGHT_TEMPLE = 454; // 右太阳穴
}

三十、EAR 计算

30.1 眼睛纵横比原理

┌─────────────────────────────────────────────────────────────┐
│                    EAR 计算原理                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   眼睛关键点示意：                                           │
│                                                             │
│           p1                 p2                             │
│            ●─────────────────●                              │
│            │                 │                              │
│            │        ●        │ p5 (瞳孔)                    │
│       p4 ●──│────────────────│──● p6                        │
│            │                 │                              │
│            ●─────────────────●                              │
│           p3                 p7                             │
│                                                             │
│   EAR (Eye Aspect Ratio) 公式：                             │
│                                                             │
│                    |p2 - p6| + |p3 - p7|                   │
│   EAR = ────────────────────────────────────               │
│                    2 × |p1 - p4|                            │
│                                                             │
│   解释：                                                     │
│   • 分子：眼睛高度的垂直距离之和                             │
│   • 分母：眼睛宽度的水平距离                                 │
│   • EAR 越小 → 眼睛越闭合                                   │
│   • EAR 越大 → 眼睛越睁开                                   │
│                                                             │
│   典型值：                                                   │
│   • 完全睁开：EAR ≈ 0.25 - 0.35                            │
│   • 半开半闭：EAR ≈ 0.15 - 0.25                            │
│   • 完全闭合：EAR < 0.15                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

30.2 EAR Calculator 实现

// ear_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_EAR_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_EAR_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"

namespace mediapipe {

// ========== EAR 输出消息 ==========
message EARResult {
  float left_ear = 1;
  float right_ear = 2;
  float avg_ear = 3;
  bool left_eye_closed = 4;
  bool right_eye_closed = 5;
  bool both_eyes_closed = 6;
  uint64 timestamp_ms = 7;
}

// ========== EAR Calculator ==========
class EARCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc) {
    cc->Inputs().Tag("LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
    cc->Outputs().Tag("EAR").Set<EARResult>();
    
    cc->Options<EAROptions>();
    return absl::OkStatus();
  }

  absl::Status Open(CalculatorContext* cc) override {
    const auto& options = cc->Options<EAROptions>();
    
    ear_threshold_ = options.ear_threshold();
    
    // 眼睛关键点索引
    // 左眼: 33, 133, 160, 144, 158, 153, 154, 155
    // 右眼: 362, 263, 387, 373, 385, 380, 381, 382
    
    // P1, P2, P3, P4 (左眼)
    left_eye_indices_ = {33, 160, 158, 144, 145, 153};
    
    // P1, P2, P3, P4 (右眼)
    right_eye_indices_ = {362, 387, 385, 373, 374, 380};
    
    return absl::OkStatus();
  }

  absl::Status Process(CalculatorContext* cc) override {
    if (cc->Inputs().Tag("LANDMARKS").IsEmpty()) {
      return absl::OkStatus();
    }

    const auto& face_landmarks = 
        cc->Inputs().Tag("LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
    
    if (face_landmarks.empty()) {
      return absl::OkStatus();
    }

    const auto& landmarks = face_landmarks[0];  // 取第一个人脸

    // ========== 计算 EAR ==========
    float left_ear = CalculateEAR(landmarks, left_eye_indices_);
    float right_ear = CalculateEAR(landmarks, right_eye_indices_);
    float avg_ear = (left_ear + right_ear) / 2.0f;

    // ========== 判断眼睛状态 ==========
    bool left_closed = left_ear < ear_threshold_;
    bool right_closed = right_ear < ear_threshold_;
    bool both_closed = left_closed && right_closed;

    // ========== 构建输出 ==========
    EARResult result;
    result.set_left_ear(left_ear);
    result.set_right_ear(right_ear);
    result.set_avg_ear(avg_ear);
    result.set_left_eye_closed(left_closed);
    result.set_right_eye_closed(right_closed);
    result.set_both_eyes_closed(both_closed);
    result.set_timestamp_ms(cc->InputTimestamp().Value() / 1000);

    cc->Outputs().Tag("EAR").AddPacket(
        MakePacket<EARResult>(result).At(cc->InputTimestamp()));

    VLOG(1) << "EAR: left=" << left_ear << ", right=" << right_ear 
            << ", avg=" << avg_ear;

    return absl::OkStatus();
  }

 private:
  float ear_threshold_ = 0.2f;
  std::vector<int> left_eye_indices_;
  std::vector<int> right_eye_indices_;

  float CalculateEAR(const NormalizedLandmarkList& landmarks, 
                     const std::vector<int>& indices) {
    // 关键点坐标
    float p1_x = landmarks.landmark(indices[0]).x();
    float p1_y = landmarks.landmark(indices[0]).y();
    float p2_x = landmarks.landmark(indices[1]).x();
    float p2_y = landmarks.landmark(indices[1]).y();
    float p3_x = landmarks.landmark(indices[2]).x();
    float p3_y = landmarks.landmark(indices[2]).y();
    float p4_x = landmarks.landmark(indices[3]).x();
    float p4_y = landmarks.landmark(indices[3]).y();
    float p5_x = landmarks.landmark(indices[4]).x();
    float p5_y = landmarks.landmark(indices[4]).y();
    float p6_x = landmarks.landmark(indices[5]).x();
    float p6_y = landmarks.landmark(indices[5]).y();

    // 计算 EAR
    float vertical_1 = std::sqrt(std::pow(p2_x - p6_x, 2) + std::pow(p2_y - p6_y, 2));
    float vertical_2 = std::sqrt(std::pow(p3_x - p5_x, 2) + std::pow(p3_y - p5_y, 2));
    float horizontal = std::sqrt(std::pow(p1_x - p4_x, 2) + std::pow(p1_y - p4_y, 2));

    if (horizontal < 0.001f) {
      return 0.0f;  // 避免除零
    }

    return (vertical_1 + vertical_2) / (2.0f * horizontal);
  }
};

REGISTER_CALCULATOR(EARCalculator);

}  // namespace mediapipe

#endif  // MEDIAPIPE_CALCULATORS_IMS_EAR_CALCULATOR_H_

三十一、头部姿态估计

31.1 6DoF 姿态估计原理

┌─────────────────────────────────────────────────────────────┐
│                    头部姿态估计原理                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   输入：人脸关键点（3D）                                     │
│                                                             │
│   方法：solvePnP（Perspective-n-Point）                     │
│                                                             │
│   已知：                                                     │
│   • 3D 模型点（标准人脸模型）                               │
│   • 2D 图像点（Face Mesh 检测结果）                         │
│   • 相机内参                                                │
│                                                             │
│   求解：                                                     │
│   • 旋转矩阵 R (3×3)                                        │
│   • 平移向量 t (3×1)                                        │
│                                                             │
│   输出：                                                     │
│   • Pitch（俯仰角）：点头/抬头                              │
│   • Yaw（偏航角）：左转/右转                                │
│   • Roll（翻滚角）：左倾/右倾                               │
│                                                             │
│   姿态范围：                                                 │
│   • Pitch: -90° ~ +90°（低头到抬头）                        │
│   • Yaw: -90° ~ +90°（左转到右转）                          │
│   • Roll: -45° ~ +45°（左倾到右倾）                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

31.2 Head Pose Calculator 实现

// head_pose_calculator.h
#ifndef MEDIAPIPE_CALCULATORS_IMS_HEAD_POSE_CALCULATOR_H_
#define MEDIAPIPE_CALCULATORS_IMS_HEAD_POSE_CALCULATOR_H_

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include <opencv2/opencv.hpp>

namespace mediapipe {

// ========== 头部姿态消息 ==========
message HeadPose {
  float pitch = 1;  // 俯仰角（度）
  float yaw = 2;    // 偏航角（度）
  float roll = 3;   // 翻滚角（度）
  uint64 timestamp_ms = 4;
}

// ========== Head Pose Calculator ==========
class HeadPoseCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc) {
    cc->Inputs().Tag("LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
    cc->Outputs().Tag("POSE").Set<HeadPose>();
    
    cc->Options<HeadPoseOptions>();
    return absl::OkStatus();
  }

  absl::Status Open(CalculatorContext* cc) override {
    // 初始化 3D 模型点（标准人脸模型）
    // 单位：毫米
    model_points_ = {
      cv::Point3d(0.0, 0.0, 0.0),           // 鼻尖
      cv::Point3d(0.0, -330.0, -65.0),      // 下巴
      cv::Point3d(-225.0, 170.0, -135.0),   // 左眼外角
      cv::Point3d(225.0, 170.0, -135.0),    // 右眼外角
      cv::Point3d(-150.0, -150.0, -125.0),  // 左嘴角
      cv::Point3d(150.0, -150.0, -125.0)    // 右嘴角
    };
    
    return absl::OkStatus();
  }

  absl::Status Process(CalculatorContext* cc) override {
    if (cc->Inputs().Tag("LANDMARKS").IsEmpty()) {
      return absl::OkStatus();
    }

    const auto& face_landmarks = 
        cc->Inputs().Tag("LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
    
    if (face_landmarks.empty()) {
      return absl::OkStatus();
    }

    const auto& landmarks = face_landmarks[0];

    // ========== 1. 获取关键点 2D 坐标 ==========
    std::vector<cv::Point2d> image_points;
    
    // 鼻尖 (1)
    image_points.push_back(cv::Point2d(
        landmarks.landmark(1).x(),
        landmarks.landmark(1).y()));
    
    // 下巴 (152)
    image_points.push_back(cv::Point2d(
        landmarks.landmark(152).x(),
        landmarks.landmark(152).y()));
    
    // 左眼外角 (33)
    image_points.push_back(cv::Point2d(
        landmarks.landmark(33).x(),
        landmarks.landmark(33).y()));
    
    // 右眼外角 (263)
    image_points.push_back(cv::Point2d(
        landmarks.landmark(263).x(),
        landmarks.landmark(263).y()));
    
    // 左嘴角 (61)
    image_points.push_back(cv::Point2d(
        landmarks.landmark(61).x(),
        landmarks.landmark(61).y()));
    
    // 右嘴角 (291)
    image_points.push_back(cv::Point2d(
        landmarks.landmark(291).x(),
        landmarks.landmark(291).y()));

    // ========== 2. 设置相机内参 ==========
    // 假设图像尺寸 640×480
    double focal_length = 640.0;
    cv::Point2d center(320.0, 240.0);
    cv::Mat camera_matrix = (cv::Mat_<double>(3, 3) << 
        focal_length, 0, center.x,
        0, focal_length, center.y,
        0, 0, 1);
    
    cv::Mat dist_coeffs = cv::Mat::zeros(4, 1, CV_64F);

    // ========== 3. solvePnP ==========
    cv::Mat rotation_vector;
    cv::Mat translation_vector;
    
    bool success = cv::solvePnP(
        model_points_, 
        image_points,
        camera_matrix, 
        dist_coeffs,
        rotation_vector, 
        translation_vector);
    
    if (!success) {
      LOG(WARNING) << "solvePnP failed";
      return absl::OkStatus();
    }

    // ========== 4. 转换为欧拉角 ==========
    cv::Mat rotation_matrix;
    cv::Rodrigues(rotation_vector, rotation_matrix);
    
    double pitch, yaw, roll;
    RotationMatrixToEulerAngles(rotation_matrix, pitch, yaw, roll);

    // ========== 5. 输出 ==========
    HeadPose pose;
    pose.set_pitch(static_cast<float>(pitch));
    pose.set_yaw(static_cast<float>(yaw));
    pose.set_roll(static_cast<float>(roll));
    pose.set_timestamp_ms(cc->InputTimestamp().Value() / 1000);

    cc->Outputs().Tag("POSE").AddPacket(
        MakePacket<HeadPose>(pose).At(cc->InputTimestamp()));

    VLOG(1) << "Head pose: pitch=" << pitch << ", yaw=" << yaw << ", roll=" << roll;

    return absl::OkStatus();
  }

 private:
  std::vector<cv::Point3d> model_points_;
  
  void RotationMatrixToEulerAngles(const cv::Mat& R, 
                                    double& pitch, double& yaw, double& roll) {
    // 计算欧拉角（ZYX 顺序）
    double sy = std::sqrt(R.at<double>(0, 0) * R.at<double>(0, 0) + 
                         R.at<double>(1, 0) * R.at<double>(1, 0));
    
    bool singular = sy < 1e-6;
    
    if (!singular) {
      pitch = std::atan2(R.at<double>(2, 1), R.at<double>(2, 2));
      yaw = std::atan2(-R.at<double>(2, 0), sy);
      roll = std::atan2(R.at<double>(1, 0), R.at<double>(0, 0));
    } else {
      pitch = std::atan2(-R.at<double>(1, 2), R.at<double>(1, 1));
      yaw = std::atan2(-R.at<double>(2, 0), sy);
      roll = 0;
    }
    
    // 转换为度
    pitch = pitch * 180.0 / CV_PI;
    yaw = yaw * 180.0 / CV_PI;
    roll = roll * 180.0 / CV_PI;
  }
};

REGISTER_CALCULATOR(HeadPoseCalculator);

}  // namespace mediapipe

#endif  // MEDIAPIPE_CALCULATORS_IMS_HEAD_POSE_CALCULATOR_H_

三十二、IMS 实战：疲劳检测

32.1 完整 DMS 疲劳检测 Graph

# ims_fatigue_detection_graph.pbtxt

input_stream: "IR_IMAGE:ir_image"
output_stream: "FATIGUE_RESULT:fatigue_result"
output_stream: "ALERT:alert"

# ========== 1. Face Mesh ==========
node {
  calculator: "FaceMeshGpu"
  input_stream: "IMAGE:ir_image"
  output_stream: "LANDMARKS:multi_face_landmarks"
  output_stream: "FACE_RECTS:face_rects"
  options {
    [mediapipe.FaceMeshOptions.ext] {
      max_num_faces: 1
      refine_landmarks: true
      min_detection_confidence: 0.5
      min_tracking_confidence: 0.5
    }
  }
}

# ========== 2. EAR 计算 ==========
node {
  calculator: "EARCalculator"
  input_stream: "LANDMARKS:multi_face_landmarks"
  output_stream: "EAR:ear_result"
  options {
    [mediapipe.EAROptions.ext] {
      ear_threshold: 0.2
    }
  }
}

# ========== 3. 头部姿态估计 ==========
node {
  calculator: "HeadPoseCalculator"
  input_stream: "LANDMARKS:multi_face_landmarks"
  output_stream: "POSE:head_pose"
}

# ========== 4. 眨眼检测 ==========
node {
  calculator: "BlinkDetectorCalculator"
  input_stream: "EAR:ear_result"
  output_stream: "BLINK:blink_result"
  options {
    [mediapipe.BlinkDetectorOptions.ext] {
      ear_threshold: 0.2
      min_blink_frames: 2
      max_blink_frames: 10
    }
  }
}

# ========== 5. PERCLOS 计算 ==========
node {
  calculator: "PERCLOSCalculator"
  input_stream: "EAR:ear_result"
  output_stream: "PERCLOS:perclos"
  options {
    [mediapipe.PERCLOSOptions.ext] {
      window_frames: 30
      closed_threshold: 0.2
    }
  }
}

# ========== 6. 疲劳综合判断 ==========
node {
  calculator: "FatigueDecisionCalculator"
  input_stream: "EAR:ear_result"
  input_stream: "POSE:head_pose"
  input_stream: "BLINK:blink_result"
  input_stream: "PERCLOS:perclos"
  output_stream: "FATIGUE_RESULT:fatigue_result"
  output_stream: "ALERT:alert"
  options {
    [mediapipe.FatigueDecisionOptions.ext] {
      perclos_threshold: 0.15
      blink_rate_low: 5.0
      blink_rate_high: 30.0
      head_pose_threshold: 30.0
    }
  }
}

32.2 疲劳判断逻辑

// fatigue_decision_calculator.cc

absl::Status FatigueDecisionCalculator::Process(CalculatorContext* cc) override {
  // ========== 收集输入 ==========
  const auto& ear = cc->Inputs().Tag("EAR").Get<EARResult>();
  const auto& pose = cc->Inputs().Tag("POSE").Get<HeadPose>();
  const auto& blink = cc->Inputs().Tag("BLINK").Get<BlinkResult>();
  const auto& perclos = cc->Inputs().Tag("PERCLOS").Get<float>();

  // ========== 计算疲劳分数 ==========
  float fatigue_score = 0.0f;
  
  // 1. PERCLOS 贡献 (权重 0.4)
  if (perclos > perclos_threshold_) {
    fatigue_score += 0.4f * (perclos / 0.5f);  // 归一化
  }
  
  // 2. 眨眼频率异常 (权重 0.2)
  if (blink.blink_rate() < blink_rate_low_ || 
      blink.blink_rate() > blink_rate_high_) {
    fatigue_score += 0.2f;
  }
  
  // 3. 头部姿态异常 (权重 0.2)
  if (std::abs(pose.pitch()) > head_pose_threshold_ ||
      std::abs(pose.yaw()) > head_pose_threshold_) {
    fatigue_score += 0.2f;
  }
  
  // 4. 眼睛闭合 (权重 0.2)
  if (ear.both_eyes_closed()) {
    fatigue_score += 0.2f;
  }

  // 限制范围
  fatigue_score = std::min(1.0f, fatigue_score);

  // ========== 判断疲劳等级 ==========
  int fatigue_level = 0;
  if (fatigue_score > 0.8f) {
    fatigue_level = 3;  // 极度疲劳
  } else if (fatigue_score > 0.5f) {
    fatigue_level = 2;  // 明显疲劳
  } else if (fatigue_score > 0.3f) {
    fatigue_level = 1;  // 轻度疲劳
  }

  // ========== 输出 ==========
  FatigueResult result;
  result.set_fatigue_score(fatigue_score);
  result.set_fatigue_level(fatigue_level);
  result.set_perclos(perclos);
  result.set_blink_rate(blink.blink_rate());
  result.set_head_pitch(pose.pitch());
  result.set_head_yaw(pose.yaw());
  result.set_ear_avg(ear.avg_ear());

  cc->Outputs().Tag("FATIGUE_RESULT").AddPacket(
      MakePacket<FatigueResult>(result).At(cc->InputTimestamp()));

  bool alert = fatigue_level >= 2;
  cc->Outputs().Tag("ALERT").AddPacket(
      MakePacket<bool>(alert).At(cc->InputTimestamp()));

  return absl::OkStatus();
}

三十三、总结

要点	说明
关键点数	468 个 3D 点
眼睛区域	各 33 点 + 虹膜 10 点
EAR 计算	眼睛纵横比，判断开合
头部姿态	solvePnP 估计 yaw/pitch/roll
IMS 应用	疲劳检测、分心检测

下篇预告

MediaPipe 系列 28：Hand Tracking——手部检测与追踪

深入讲解手部关键点检测、手势识别、IMS 手势交互应用。

参考资料

Google AI Edge. Face Mesh
MediaPipe. Face Mesh Paper
T. Soukupova et al. “Eye Blink Detection using Facial Landmarks”

系列进度： 27/55
更新时间： 2026-03-12

MediaPipe 系列 > 内置 Solution

#DMS #IMS #MediaPipe #EAR #Face Mesh #人脸网格 #关键点

MediaPipe 系列 27：Face Mesh——468 点人脸关键点完整指南

https://dapalm.com/2026/03/13/MediaPipe系列27-Face-Mesh：468点人脸关键点/

作者

Mars

发布于

2026年3月13日

许可协议

MediaPipe 系列 25：错误处理 Calculator——异常恢复机制完整指南上一篇

MediaPipe 系列 34：Object Detection——高效目标检测 Pipeline 下一篇