TDGH-YOLOv7:实时驾驶员头部与凝视检测模型
发布时间: 2026-03-16
标签: #YOLO #凝视估计 #实时检测 #边缘部署 #DMS
📝 论文来源
AI-enabled driver assistance: monitoring head and gaze movements for enhanced safety
发表于 Complex & Intelligent Systems, Springer Nature, May 2025
🎯 技术亮点
TDGH-YOLOv7(Two-branch Dynamic Guided Head YOLOv7)实现了驾驶员面部区域、头部姿态、眼对的自主识别,具有高精度和高帧率特点。
🏗️ 架构设计
双分支结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| ┌─────────────────────────────────────────────────┐ │ TDGH-YOLOv7 Architecture │ ├─────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────┐ │ │ │ Input Image │ │ │ └──────────┬──────────┘ │ │ ↓ │ │ ┌─────────────────────┐ │ │ │ YOLOv7 Backbone │ │ │ │ (Feature Extract) │ │ │ └──────────┬──────────┘ │ │ ↓ │ │ ┌───────────────┴───────────────┐ │ │ ↓ ↓ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Global Head │ │ Dynamic Head │ │ │ │ Branch │ │ Branch │ │ │ │ (Face Det) │ │ (Landmarks) │ │ │ └──────┬───────┘ └──────┬───────┘ │ │ └───────────────┬─────────┘ │ │ ↓ │ │ ┌─────────────────────┐ │ │ │ Guided Attention │ │ │ │ Fusion Module │ │ │ └──────────┬──────────┘ │ │ ↓ │ │ ┌─────────────────────┐ │ │ │ Output: │ │ │ │ - Face BBox │ │ │ │ - Head Pose (Yaw/Pitch/Roll) │ │ │ │ - Eye Pair Location │ │ │ │ - Gaze Vector │ │ │ └─────────────────────┘ │ └─────────────────────────────────────────────────┘
|
📊 性能对比
检测精度
| 模型 |
mAP@0.5 |
FPS (GPU) |
FPS (Edge) |
| YOLOv5s |
89.2% |
140 |
28 |
| YOLOv7 |
91.5% |
120 |
22 |
| TDGH-YOLOv7 |
94.8% |
115 |
25 |
凝视估计精度
| 指标 |
TDGH-YOLOv7 |
传统方法 |
| 角度误差 |
3.2° |
5-8° |
| 帧率 |
115 FPS |
30-60 FPS |
| 边缘部署 |
支持 |
有限 |
💡 IMS开发启示
模型实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| import torch import torch.nn as nn
class TDGHYOLOv7(nn.Module): def __init__(self, num_classes=1): super().__init__() self.backbone = YOLOv7Backbone() self.global_head = GlobalDetectionHead( in_channels=[256, 512, 1024], num_classes=num_classes ) self.dynamic_head = DynamicGuidedHead( num_landmarks=6, attention_mechanism='CBAM' ) self.fusion = GuidedFusionModule() def forward(self, x): features = self.backbone(x) global_out = self.global_head(features) dynamic_out = self.dynamic_head(features, global_out) output = self.fusion(global_out, dynamic_out) return { 'face_bbox': output['bbox'], 'head_pose': output['pose'], 'eye_pairs': output['eyes'], 'gaze_vector': output['gaze'] }
|
凝视估计后处理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| class GazeEstimator { public: Vector3D estimateGaze( const cv::Point2f& left_eye, const cv::Point2f& right_eye, const HeadPose& pose ) { cv::Point2f eye_center = (left_eye + right_eye) * 0.5f; Vector3D gaze_direction = calculateInitialGaze(eye_center); gaze_direction = applyHeadPoseCorrection(gaze_direction, pose); gaze_direction.normalize(); return gaze_direction; } bool isDistracted( const Vector3D& gaze, float threshold_degrees = 30.0f ) { Vector3D forward(0, 0, 1); float angle = angleBetween(gaze, forward); return angle > threshold_degrees; } };
|
🎯 部署优化
TensorRT加速
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| import torch_tensorrt
model = TDGHYOLOv7() model.eval()
trt_model = torch_tensorrt.compile( model, inputs=[torch_tensorrt.Input( min_shape=[1, 3, 640, 640], opt_shape=[1, 3, 640, 640], max_shape=[4, 3, 640, 640], dtype=torch.float32 )], enabled_precisions={torch.int8}, calibrator=calibration_data )
|
ONNX导出
1 2 3 4 5 6 7 8 9 10 11 12 13
| torch.onnx.export( model, dummy_input, "tdgh_yolov7.onnx", opset_version=12, input_names=['input'], output_names=['face_bbox', 'head_pose', 'eye_pairs', 'gaze'], dynamic_axes={ 'input': {0: 'batch_size'}, 'face_bbox': {0: 'batch_size'} } )
|
📈 Euro NCAP合规对照
| Euro NCAP要求 |
TDGH-YOLOv7能力 |
状态 |
| 25Hz刷新率 |
115 FPS |
✅ 超标 |
| 凝视精度≤3° |
3.2°误差 |
⚠️ 接近 |
| 头部追踪 |
Yaw/Pitch/Roll |
✅ |
| 多人群覆盖 |
支持迁移学习 |
✅ |
📚 参考资料
- TDGH-YOLOv7 Paper, Complex & Intelligent Systems, May 2025
- YOLOv7 Official Implementation
- TensorRT Optimization Guide
结论: TDGH-YOLOv7展示了实时凝视估计的高效解决方案。关键启示:双分支架构提升检测精度、动态引导头增强关键点定位、TensorRT INT8量化实现2.4倍加速。对于IMS开发,该模型是Euro NCAP凝视检测要求的高性价比方案。