DMS疲劳/分心/情绪检测统一框架：基于MobileNetV2和MediaPipe的实时系统

发表于 2026-04-17 更新于 2026-04-18 分类于 IMS技术， DMS算法

深入分析2026年最新DMS统一检测框架，实现疲劳、分心、情绪三合一实时检测。基于MobileNetV2和MediaPipe的轻量级方案，疲劳检测准确率88.89%，分心检测100%，为IMS开发提供完整技术路线。

一、研究背景与Euro NCAP 2026要求

根据Euro NCAP 2026协议，驾驶员监控系统（DMS）评分从2分提升到25分，成为车辆安全评级的核心指标。新协议要求：

检测类型	2023协议	2026协议	技术要求
疲劳检测	可选	强制	PERCLOS阈值、检测时延要求
分心检测	可选	强制	视线偏离、手机使用检测
情绪/状态	无	新增	行为异常识别

本文基于2026年1月发表在PMC的最新研究，详细介绍一套统一检测框架，能够同时处理疲劳、分心、情绪三类任务，且满足实时性要求。

二、系统架构设计

2.1 整体流程

摄像头输入 → 面部检测 → 468点landmark提取
    ↓
    ├─ EAR计算 → 疲劳检测
    ├─ MAR计算 → 哈欠检测
    ├─ 头部姿态 + 虹膜定位 → 分心检测
    └─ CNN分类 → 情绪识别

2.2 硬件配置

组件	型号	参数	成本估算
摄像头	通用USB CMOS	640×480@30fps	$15-30
处理器	Intel i7-12700	基频3.4GHz	-
GPU	NVIDIA RTX 3070	CUDA 11.8	-

关键发现： 采用USB Video Class (UVC)标准摄像头，无需专用驱动，大幅降低系统集成难度。

三、核心算法详解

3.1 疲劳检测 - Eye Aspect Ratio (EAR)

计算公式

# 基于MediaPipe Face Mesh的468点landmark
# 眼睛关键点索引：
# 左眼：33, 160, 158, 133, 153, 144
# 右眼：362, 385, 387, 263, 373, 380

def calculate_ear(landmarks, eye_indices):
    """
    计算Eye Aspect Ratio
    
    Args:
        landmarks: MediaPipe 468点landmark
        eye_indices: [p1, p2, p3, p4, p5, p6]
    
    Returns:
        EAR值（范围0-1，正常约0.25-0.30）
    """
    p1 = landmarks[eye_indices[0]]  # 左角
    p2 = landmarks[eye_indices[1]]  # 上眼睑1
    p3 = landmarks[eye_indices[2]]  # 上眼睑2
    p4 = landmarks[eye_indices[3]]  # 右角
    p5 = landmarks[eye_indices[4]]  # 下眼睑1
    p6 = landmarks[eye_indices[5]]  # 下眼睑2
    
    # 垂直距离
    vertical_1 = np.linalg.norm(p2 - p6)
    vertical_2 = np.linalg.norm(p3 - p5)
    
    # 水平距离
    horizontal = np.linalg.norm(p1 - p4)
    
    ear = (vertical_1 + vertical_2) / (2.0 * horizontal)
    return ear

# 双眼平均
ear_left = calculate_ear(landmarks, [33, 160, 158, 133, 153, 144])
ear_right = calculate_ear(landmarks, [362, 385, 387, 263, 373, 380])
ear_avg = (ear_left + ear_right) / 2.0

阈值设定

基于27名驾驶员的统计数据：

参数	值	来源
平均EAR（清醒状态）	0.257 ± 0.021	实测数据
疲劳阈值	0.23	约均值-1.3σ
触发时延	连续3帧	避免眨眼误检

Euro NCAP对应要求：

PERCLOS（Percentage of Eye Closure）计算
OEM需在Dossier中声明阈值
建议采用自适应阈值策略

3.2 哈欠检测 - Mouth Aspect Ratio (MAR)

# 嘴唇关键点索引：61, 65, 63, 67, 64, 66
def calculate_mar(landmarks):
    """
    计算Mouth Aspect Ratio
    
    Returns:
        MAR值（正常约0.2-0.4，哈欠时>0.6）
    """
    # 嘴角
    p1 = landmarks[61]  # 左角
    p4 = landmarks[67]  # 右角
    
    # 上下唇
    p2 = landmarks[65]  # 上唇中
    p3 = landmarks[63]  # 上唇侧
    p5 = landmarks[64]  # 下唇中
    p6 = landmarks[66]  # 下唇侧
    
    vertical_1 = np.linalg.norm(p2 - p6)
    vertical_2 = np.linalg.norm(p3 - p5)
    horizontal = np.linalg.norm(p1 - p4)
    
    mar = (vertical_1 + vertical_2) / (2.0 * horizontal)
    return mar

检测精度： 85.19%（27人测试）

挑战：

部分哈欠嘴未完全张开
说话/笑可能误触发
需结合时序分析

3.3 分心检测 - 头部姿态 + 虹膜定位

头部姿态估计

import cv2
import numpy as np

def estimate_head_pose(landmarks, image_size=(640, 480)):
    """
    基于468点landmark估计头部姿态
    
    Returns:
        pitch: 俯仰角（上下，正值为低头）
        yaw: 偏航角（左右，正值为右转）
        roll: 翻滚角
    """
    # 关键点：鼻尖、下巴、眼角、嘴角
    # 使用solvePnP求解3D姿态
    
    # 3D模型点（标准人脸模型）
    model_points = np.array([
        (0.0, 0.0, 0.0),             # 鼻尖
        (0.0, -330.0, -65.0),        # 下巴
        (-225.0, 170.0, -135.0),     # 左眼外角
        (225.0, 170.0, -135.0),      # 右眼外角
        (-150.0, -150.0, -125.0),    # 左嘴角
        (150.0, -150.0, -125.0)      # 右嘴角
    ])
    
    # 2D图像点
    image_points = np.array([
        landmarks[1],      # 鼻尖
        landmarks[152],    # 下巴
        landmarks[33],     # 左眼外角
        landmarks[263],    # 右眼外角
        landmarks[61],     # 左嘴角
        landmarks[291]     # 右嘴角
    ], dtype=np.float64)
    
    # 相机内参
    focal_length = image_size[0]
    center = (image_size[0]/2, image_size[1]/2)
    camera_matrix = np.array([
        [focal_length, 0, center[0]],
        [0, focal_length, center[1]],
        [0, 0, 1]
    ], dtype=np.float64)
    
    # 求解PnP
    _, rotation_vector, _ = cv2.solvePnP(
        model_points, image_points, camera_matrix, None
    )
    
    # 转换为欧拉角
    rotation_mat, _ = cv2.Rodrigues(rotation_vector)
    pitch = np.arcsin(-rotation_mat[2, 1])
    yaw = np.arctan2(rotation_mat[2, 0], rotation_mat[2, 2])
    roll = np.arctan2(rotation_mat[0, 1], rotation_mat[1, 1])
    
    return np.degrees(pitch), np.degrees(yaw), np.degrees(roll)

虹膜定位

MediaPipe Face Mesh提供虹膜关键点：

# 虹膜关键点索引
LEFT_IRIS = [468, 469, 470, 471, 472]  # 中心+四周
RIGHT_IRIS = [473, 474, 475, 476, 477]

def get_gaze_direction(landmarks, head_pitch, head_yaw):
    """
    结合头部姿态判断视线方向
    
    Returns:
        'forward' | 'left' | 'right' | 'up' | 'down'
    """
    # 虹膜中心
    left_iris_center = landmarks[468]
    right_iris_center = landmarks[473]
    
    # 眼眶中心（简化计算）
    left_eye_center = (landmarks[33] + landmarks[133]) / 2
    right_eye_center = (landmarks[362] + landmarks[263]) / 2
    
    # 虹膜相对位置
    left_offset = left_iris_center - left_eye_center
    right_offset = right_iris_center - right_eye_center
    
    # 判断方向
    horizontal_offset = (left_offset[0] + right_offset[0]) / 2
    vertical_offset = (left_offset[1] + right_offset[1]) / 2
    
    # 阈值判定
    H_THRESHOLD = 0.02  # 水平阈值
    V_THRESHOLD = 0.015 # 垂直阈值
    
    if horizontal_offset < -H_THRESHOLD:
        return 'left'
    elif horizontal_offset > H_THRESHOLD:
        return 'right'
    elif vertical_offset < -V_THRESHOLD:
        return 'up'
    elif vertical_offset > V_THRESHOLD:
        return 'down'
    else:
        return 'forward'

检测精度：100%（27人测试，所有分心行为正确识别）

3.4 情绪识别 - MobileNetV2 CNN

模型架构

import torch
import torch.nn as nn
from torchvision.models import mobilenet_v2

class EmotionNet(nn.Module):
    """
    基于MobileNetV2的情绪识别网络
    输入：224×224 RGB人脸图像
    输出：7类情绪概率
    """
    def __init__(self, num_classes=7):
        super(EmotionNet, self).__init__()
        
        # 加载预训练MobileNetV2
        mobilenet = mobilenet_v2(pretrained=True)
        
        # 保留特征提取部分
        self.features = mobilenet.features
        
        # 替换分类头
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(1280, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = x.mean([2, 3])  # Global Average Pooling
        x = self.classifier(x)
        return x

# 7类情绪
EMOTIONS = ['neutral', 'happy', 'sad', 'surprise', 'anger/disgust', 'fear']

训练配置

参数	值
数据集	RAF-DB（30,000张）
输入尺寸	224×224
批大小	32
训练轮次	15 epochs
优化器	Adam
学习率	0.0003
损失函数	CrossEntropy

检测精度

情绪类别	精度	备注
快乐	100%	表情明显
愤怒/厌恶	96.3%	合并类别
惊讶	92.6%	表情明显
悲伤	66.7%	表情微弱
恐惧	0%	数据集不匹配

关键发现： 恐惧检测失败是因为RAF-DB数据集中的恐惧表情过于夸张，与真实驾驶场景不符。需要领域自适应或驾驶场景专用数据集。

四、系统集成与实时运行

4.1 优先级机制

class DMSDetector:
    def detect(self, frame):
        # 1. 面部检测与landmark提取
        landmarks = self.get_landmarks(frame)
        if landmarks is None:
            return None
        
        # 2. 按优先级检测
        # 疲劳检测优先级最高
        ear = calculate_ear(landmarks)
        if ear < 0.23 and self.persistent_check('fatigue', 3):
            return Alert('FATIGUE', severity=2)
        
        # 哈欠检测
        mar = calculate_mar(landmarks)
        if mar > 0.6 and self.persistent_check('yawn', 5):
            return Alert('FATIGUE', severity=1)
        
        # 分心检测
        pitch, yaw, roll = estimate_head_pose(landmarks)
        gaze = get_gaze_direction(landmarks, pitch, yaw)
        if gaze != 'forward' and self.persistent_check('distraction', 10):
            return Alert('DISTRACTION', direction=gaze, severity=1)
        
        # 情绪识别（背景任务）
        emotion = self.emotion_net(frame)
        
        return Status(ear, mar, gaze, emotion)
    
    def persistent_check(self, alert_type, frames):
        """
        连续帧检测，避免瞬态误检
        """
        self.counter[alert_type] += 1
        if self.counter[alert_type] >= frames:
            return True
        return False

4.2 性能指标

指标	值
帧率	30 fps（640×480）
单帧处理时延	<33ms
GPU利用率	~40%（RTX 3070）
CPU利用率	~30%（i7-12700）

五、IMS开发启示

5.1 技术路线选择

方案	优点	缺点	推荐场景
纯视觉（本方案）	非侵入、低成本	光照敏感	乘用车量产
多传感器融合	高精度	高成本	高端车型
生理信号（EEG/HRV）	准确度高	侵入式	研究验证

5.2 部署优化建议

# 1. 模型量化（INT8）
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# 2. ONNX导出
torch.onnx.export(
    model,
    dummy_input,
    "dms_model.onnx",
    opset_version=11
)

# 3. TensorRT优化（高通Hexagon）
import tensorrt as trt
# 构建TensorRT引擎...

5.3 Euro NCAP合规检查清单

PERCLOS计算实现
阈值声明文档
检测时延验证
分心场景覆盖
警告等级机制
误报率统计

六、总结

本研究展示了基于MobileNetV2和MediaPipe的统一DMS框架，实现了：

疲劳检测： EAR阈值法，88.89%精度
分心检测： 头部姿态+虹膜定位，100%精度
情绪识别： CNN分类，部分情绪精度高

核心优势：

实时运行（30fps）
轻量级架构（MobileNetV2）
标准摄像头兼容
满足Euro NCAP 2026基本要求

待改进方向：

自适应阈值校准
恐惧情绪数据集
低光场景优化
多传感器融合

参考论文：

“Driver Monitoring System Using Computer Vision for Real-Time Detection of Fatigue, Distraction and Emotion via Facial Landmarks and Deep Learning”, Sensors 2026
MediaPipe Face Mesh Documentation
Euro NCAP 2026 Assessment Protocol

相关文章：