雷达-摄像头融合车内感知综述:2024论文解读与CPD应用

雷达-摄像头融合车内感知综述:2024论文解读与CPD应用

论文信息

  • 标题: Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive Survey
  • 发表平台: arXiv
  • 论文编号: arXiv:2410.19872
  • 发表时间: 2024年10月24日
  • 开源状态: PDF + HTML版本公开

核心贡献

该综述系统梳理了2019-2024年雷达-摄像头融合领域的数据集、算法和应用场景,对车内感知(CPD/OMS)具有重要的技术指导意义。

关键内容

  1. 融合方法分类:早期融合、特征融合、决策融合
  2. 数据集汇总:nuScenes, Waymo Open, K-Radar等
  3. 关键挑战:传感器标定、数据对齐、模态互补
  4. 未来方向:自监督学习、Transformer架构、多任务学习

融合架构详解

1. 融合层级分类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
from enum import Enum
from typing import List, Dict
import numpy as np

class FusionLevel(Enum):
"""融合层级枚举"""
EARLY = "early" # 数据级融合
FEATURE = "feature" # 特征级融合
DECISION = "decision" # 决策级融合
HYBRID = "hybrid" # 混合融合


class RadarCameraFusion:
"""
雷达-摄像头融合基础框架

论文Figure 2架构实现
"""

def __init__(self, fusion_level: FusionLevel):
self.fusion_level = fusion_level
self.radar_processor = RadarProcessor()
self.camera_processor = CameraProcessor()
self.fusion_module = self._init_fusion_module()

def _init_fusion_module(self):
"""初始化融合模块"""
if self.fusion_level == FusionLevel.EARLY:
return EarlyFusionModule()
elif self.fusion_level == FusionLevel.FEATURE:
return FeatureFusionModule()
elif self.fusion_level == FusionLevel.DECISION:
return DecisionFusionModule()
else:
return HybridFusionModule()

def process(self, radar_data: dict, camera_data: dict) -> dict:
"""
融合处理

Args:
radar_data: 雷达数据 {point_cloud, range_doppler, ...}
camera_data: 相机数据 {rgb_image, depth_map, ...}

Returns:
检测结果 {objects, confidence, ...}
"""
# 雷达处理
radar_features = self.radar_processor.extract(radar_data)

# 相机处理
camera_features = self.camera_processor.extract(camera_data)

# 融合
fused_output = self.fusion_module.fuse(radar_features, camera_features)

return fused_output

2. 特征级融合实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import torch
import torch.nn as nn
import torch.nn.functional as F

class FeatureFusionModule(nn.Module):
"""
特征级融合模块

论文Section 4.2: 在特征空间进行融合
优势: 保留中间表示,端到端训练
"""

def __init__(self, radar_dim: int = 64, camera_dim: int = 256,
fused_dim: int = 128):
super().__init__()

# 雷达特征编码器
self.radar_encoder = nn.Sequential(
nn.Conv2d(radar_dim, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.Conv2d(128, fused_dim, kernel_size=3, padding=1),
nn.BatchNorm2d(fused_dim),
nn.ReLU(inplace=True)
)

# 相机特征编码器
self.camera_encoder = nn.Sequential(
nn.Conv2d(camera_dim, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, fused_dim, kernel_size=3, padding=1),
nn.BatchNorm2d(fused_dim),
nn.ReLU(inplace=True)
)

# 融合层
self.fusion_conv = nn.Sequential(
nn.Conv2d(fused_dim * 2, fused_dim, kernel_size=3, padding=1),
nn.BatchNorm2d(fused_dim),
nn.ReLU(inplace=True),
nn.Conv2d(fused_dim, fused_dim, kernel_size=3, padding=1),
nn.BatchNorm2d(fused_dim),
nn.ReLU(inplace=True)
)

# 注意力机制
self.attention = ChannelAttention(fused_dim * 2)

def forward(self, radar_feat: torch.Tensor,
camera_feat: torch.Tensor) -> torch.Tensor:
"""
前向融合

Args:
radar_feat: 雷达特征 (B, C_r, H, W)
camera_feat: 相机特征 (B, C_c, H', W')

Returns:
融合特征 (B, fused_dim, H, W)
"""
# 特征编码
radar_encoded = self.radar_encoder(radar_feat)
camera_encoded = self.camera_encoder(camera_feat)

# 空间对齐 (上采样到相同尺寸)
if radar_encoded.shape[2:] != camera_encoded.shape[2:]:
camera_encoded = F.interpolate(
camera_encoded,
size=radar_encoded.shape[2:],
mode='bilinear',
align_corners=False
)

# 拼接
concat_feat = torch.cat([radar_encoded, camera_encoded], dim=1)

# 注意力加权
attended_feat = self.attention(concat_feat)

# 融合
fused_feat = self.fusion_conv(attended_feat)

return fused_feat


class ChannelAttention(nn.Module):
"""通道注意力模块"""

def __init__(self, channels: int, reduction: int = 16):
super().__init__()

self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)

self.fc = nn.Sequential(
nn.Linear(channels, channels // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channels // reduction, channels, bias=False)
)

self.sigmoid = nn.Sigmoid()

def forward(self, x: torch.Tensor) -> torch.Tensor:
B, C, _, _ = x.size()

avg_out = self.fc(self.avg_pool(x).view(B, C))
max_out = self.fc(self.max_pool(x).view(B, C))

attention = self.sigmoid(avg_out + max_out).view(B, C, 1, 1)

return x * attention

3. 决策级融合实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
class DecisionFusionModule:
"""
决策级融合模块

论文Section 4.3: 独立检测后融合结果
优势: 模块化,可解释性强
"""

def __init__(self, confidence_threshold: float = 0.5):
self.confidence_threshold = confidence_threshold

# 雷达检测器
self.radar_detector = RadarDetector()

# 相机检测器
self.camera_detector = CameraDetector()

# 跟踪器
self.tracker = MultiObjectTracker()

def fuse_detections(self, radar_objects: List[dict],
camera_objects: List[dict]) -> List[dict]:
"""
融合检测结果

Args:
radar_objects: 雷达检测结果 [{position, velocity, confidence}, ...]
camera_objects: 相机检测结果 [{bbox, class, confidence}, ...]

Returns:
融合后的检测结果
"""
fused_objects = []

# 空间关联
for cam_obj in camera_objects:
# 相机bbox转换为3D位置
cam_position = self._bbox_to_position(cam_obj['bbox'])

# 寻找匹配的雷达目标
matched_radar = self._find_match(cam_position, radar_objects)

if matched_radar:
# 融合信息
fused_obj = {
'id': cam_obj.get('id', 0),
'position': matched_radar['position'], # 雷达提供精确距离
'velocity': matched_radar['velocity'], # 雷达提供速度
'class': cam_obj['class'], # 相机提供分类
'bbox': cam_obj['bbox'], # 相机提供边界框
'confidence': (cam_obj['confidence'] +
matched_radar['confidence']) / 2,
'sensor': 'radar_camera'
}
else:
# 仅相机检测
fused_obj = {
'id': cam_obj.get('id', 0),
'position': cam_position,
'velocity': None,
'class': cam_obj['class'],
'bbox': cam_obj['bbox'],
'confidence': cam_obj['confidence'],
'sensor': 'camera_only'
}

fused_objects.append(fused_obj)

# 添加仅雷达检测的目标
for radar_obj in radar_objects:
if not self._has_match(radar_obj, fused_objects):
fused_objects.append({
'id': radar_obj.get('id', 0),
'position': radar_obj['position'],
'velocity': radar_obj['velocity'],
'class': 'unknown',
'bbox': None,
'confidence': radar_obj['confidence'],
'sensor': 'radar_only'
})

return fused_objects

def _find_match(self, position: np.ndarray,
radar_objects: List[dict],
threshold: float = 0.5) -> dict:
"""寻找空间匹配"""
for radar_obj in radar_objects:
distance = np.linalg.norm(position - radar_obj['position'])
if distance < threshold:
return radar_obj
return None

车内感知应用

1. CPD儿童存在检测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class CPDDetector:
"""
儿童存在检测系统

融合60GHz雷达和RGB-IR相机
符合Euro NCAP 2026 CPD要求
"""

# 检测状态
DETECTION_STATES = {
'EMPTY': 0, # 空座
'CHILD_REAR': 1, # 后排儿童
'CHILD_FORWARD': 2, # 前向儿童座椅
'CHILD_REARWARD': 3, # 后向儿童座椅
'ADULT': 4, # 成人
'PET': 5, # 宠物
'UNKNOWN': 6 # 未知
}

def __init__(self):
# 传感器
self.radar = Radar60GHz('IWR6843AOP')
self.camera = RGBIRCamera('OV2311')

# 融合模型
self.fusion_model = FeatureFusionModule(
radar_dim=64,
camera_dim=256,
fused_dim=128
)

# 分类器
self.classifier = nn.Linear(128, len(self.DETECTION_STATES))

# 生命体征检测
self.vital_signs = VitalSignsDetector()

def detect(self) -> dict:
"""
执行CPD检测

Returns:
{
'state': 检测状态,
'confidence': 置信度,
'vital_signs': {heart_rate, breathing_rate},
'position': 3D位置,
'alarm_needed': 是否需要报警
}
"""
# 采集数据
radar_data = self.radar.capture()
camera_data = self.camera.capture()

# 特征提取
radar_feat = self._extract_radar_features(radar_data)
camera_feat = self._extract_camera_features(camera_data)

# 融合
fused_feat = self.fusion_model(radar_feat, camera_feat)

# 分类
class_logits = self.classifier(fused_feat.mean(dim=[2, 3]))
class_probs = F.softmax(class_logits, dim=1)

state_id = class_probs.argmax(dim=1).item()
confidence = class_probs.max().item()

# 生命体征
vital_signs = self.vital_signs.detect(radar_data)

# 判断是否需要报警
alarm_needed = (
state_id in [1, 2, 3, 5] and # 儿童/宠物
vital_signs['heart_rate'] > 0 # 检测到生命体征
)

return {
'state': self.DETECTION_STATES(state_id).name,
'state_id': state_id,
'confidence': confidence,
'vital_signs': vital_signs,
'alarm_needed': alarm_needed
}

def _extract_radar_features(self, radar_data: dict) -> torch.Tensor:
"""提取雷达特征"""
# Range-Doppler图
rd_map = radar_data['range_doppler']

# 转换为tensor
rd_tensor = torch.from_numpy(rd_map).float().unsqueeze(0).unsqueeze(0)

# 简单CNN特征
feat_conv = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
return feat_conv(rd_tensor)

def _extract_camera_features(self, camera_data: dict) -> torch.Tensor:
"""提取相机特征"""
# 使用预训练ResNet
import torchvision.models as models
resnet = models.resnet18(pretrained=True)

# 移除最后两层
feature_extractor = nn.Sequential(*list(resnet.children())[:-2])

# 前向
image = torch.from_numpy(camera_data['rgb']).float()
image = image.permute(2, 0, 1).unsqueeze(0) / 255.0

return feature_extractor(image)


class VitalSignsDetector:
"""生命体征检测"""

def detect(self, radar_data: dict) -> dict:
"""
检测生命体征

使用雷达微多普勒特征检测:
- 心率 (40-200 bpm)
- 呼吸率 (10-40 bpm)
"""
# 获取Range-Doppler-Time数据
rdt = radar_data.get('range_doppler_time', None)

if rdt is None:
return {'heart_rate': 0, 'breathing_rate': 0, 'confidence': 0}

# FFT提取频率分量
fft_result = np.fft.fft(rdt, axis=2)
freqs = np.fft.fftfreq(rdt.shape[2], d=1/30) # 30fps

# 心率频段 (0.67-3.33 Hz = 40-200 bpm)
heart_mask = (np.abs(freqs) >= 0.67) & (np.abs(freqs) <= 3.33)
heart_power = np.sum(np.abs(fft_result[:, :, heart_mask]), axis=(0, 1))

# 呼吸率频段 (0.17-0.67 Hz = 10-40 bpm)
breath_mask = (np.abs(freqs) >= 0.17) & (np.abs(freqs) <= 0.67)
breath_power = np.sum(np.abs(fft_result[:, :, breath_mask]), axis=(0, 1))

# 估算心率
heart_freq = freqs[heart_mask][np.argmax(heart_power)]
heart_rate = np.abs(heart_freq) * 60 # Hz -> bpm

# 估算呼吸率
breath_freq = freqs[breath_mask][np.argmax(breath_power)]
breathing_rate = np.abs(breath_freq) * 60

return {
'heart_rate': float(heart_rate),
'breathing_rate': float(breathing_rate),
'confidence': min(heart_power.max() / 1000, 1.0)
}

2. OMS乘员监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class OMSDetector:
"""
乘员监控系统

融合雷达和相机进行:
- 乘员计数
- 位置检测
- 安全带检测
- 生命体征监测
"""

def __init__(self):
self.fusion = RadarCameraFusion(FusionLevel.HYBRID)
self.seat_zones = self._define_seat_zones()

def _define_seat_zones(self) -> dict:
"""定义座位区域"""
return {
'driver': {'x': [-0.5, 0.5], 'y': [0, 1.0], 'z': [0, 1.5]},
'front_passenger': {'x': [0.5, 1.5], 'y': [0, 1.0], 'z': [0, 1.5]},
'rear_left': {'x': [-0.5, 0.5], 'y': [1.0, 2.0], 'z': [0, 1.5]},
'rear_right': {'x': [0.5, 1.5], 'y': [1.0, 2.0], 'z': [0, 1.5]},
'rear_center': {'x': [0, 1.0], 'y': [1.0, 2.0], 'z': [0, 1.5]}
}

def count_occupants(self, fused_objects: List[dict]) -> int:
"""统计乘员数量"""
count = 0
for obj in fused_objects:
if obj['class'] in ['adult', 'child', 'pet']:
count += 1
return count

def localize_occupants(self, fused_objects: List[dict]) -> dict:
"""定位每个乘员"""
localization = {}

for zone_name, zone_bounds in self.seat_zones.items():
occupants = []

for obj in fused_objects:
pos = obj.get('position', None)
if pos is None:
continue

# 检查是否在区域内
in_zone = (
zone_bounds['x'][0] <= pos[0] <= zone_bounds['x'][1] and
zone_bounds['y'][0] <= pos[1] <= zone_bounds['y'][1] and
zone_bounds['z'][0] <= pos[2] <= zone_bounds['z'][1]
)

if in_zone:
occupants.append(obj)

localization[zone_name] = occupants

return localization

数据集对比

车内感知数据集

数据集 传感器 样本数 标注类型 应用场景
Valeo Cabin Sensing 雷达+相机 50K+ 乘员位置/姿态 OMS
TI mmWave In-Cabin 60GHz雷达 20K+ 生命体征/CPD CPD
Anyverse Synthetic 合成相机 100K+ 多光照/遮挡 DMS/OMS
Euro NCAP Test Set 多传感器 专用 CPD场景 认证

部署指南

1. 硬件配置

推荐配置:

组件 型号 参数 用途
60GHz雷达 TI IWR6843AOP 4发4收, 60GHz CPD/生命体征
ToF深度相机 Sony IMX316 640×480, 30fps 距离测量
RGB-IR相机 OV2311 2MP, 全局快门 视觉识别
处理器 QCS8255 Hexagon NPU 26TOPS 边缘推理

2. 性能优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class OptimizedCPD:
"""
优化后的CPD部署

目标指标:
- 延迟 < 50ms
- 帧率 > 20fps
- 功耗 < 5W
"""

def __init__(self):
# INT8量化模型
self.radar_model = load_quantized_model('radar_cpd_int8.onnx')
self.camera_model = load_quantized_model('camera_cpd_int8.onnx')
self.fusion_model = load_quantized_model('fusion_int8.onnx')

# NPU加速
self.providers = ['QNNExecutionProvider']

def infer_optimized(self, radar_data: np.ndarray,
camera_data: np.ndarray) -> dict:
"""优化推理"""
import time

# 并行处理
start = time.perf_counter()

# 雷达分支
radar_feat = self.radar_model.run(None, {'input': radar_data})

# 相机分支
camera_feat = self.camera_model.run(None, {'input': camera_data})

# 融合
fused_input = np.concatenate([radar_feat[0], camera_feat[0]], axis=1)
output = self.fusion_model.run(None, {'input': fused_input})

latency = (time.perf_counter() - start) * 1000

return {
'prediction': output[0],
'latency_ms': latency
}

开发启示

1. 融合策略选择

场景 推荐融合层级 原因
CPD生命体征 决策级 雷达独立检测呼吸/心跳
OMS乘员定位 特征级 需要空间关联
DMS分心检测 早期融合 单一传感器足够
全车监控 混合融合 多任务需求

2. 关键技术点

  1. 传感器标定:精确的外参标定是融合前提
  2. 时间同步:确保雷达和相机数据时间戳对齐
  3. 遮挡处理:利用雷达穿透性处理相机遮挡
  4. 误报控制:多传感器交叉验证降低误报

3. Euro NCAP合规

  • ✅ 雷达CPD:检测儿童呼吸运动
  • ✅ 相机OMS:乘员分类和位置检测
  • ✅ 融合验证:双重确认降低误报
  • ✅ 实时性:NPU加速满足延迟要求

参考资料:

  1. Shi, K., et al. (2024). Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive Survey. arXiv:2410.19872.
  2. Euro NCAP Child Presence Detection Protocol 2026.
  3. TI IWR6843AOP Technical Reference Manual.

雷达-摄像头融合车内感知综述:2024论文解读与CPD应用
https://dapalm.com/2026/06/16/2026-06-16-Radar-Camera-Fusion-Survey-CPD-OMS-Application/
作者
Mars
发布于
2026年6月16日
许可协议