雷达-摄像头融合目标检测与跟踪综述


核心贡献

首个系统性的雷达-摄像头融合检测跟踪综述

问题:多模态融合是实现复杂环境下可靠感知的关键,但雷达-摄像头融合缺乏系统性研究

解决方案

  1. 全面分类体系:覆盖传感器标定、模态表示、数据对齐、融合操作
  2. 详细任务划分:目标检测与跟踪的深度学习方法
  3. 开放问题分析:模型鲁棒性、模态不确定性、缺失模态处理

传感器特性对比

多传感器性能对比

传感器 语义信息 距离测量 角度分辨率 速度测量 光照适应性 天气鲁棒性
超声波
红外
LiDAR
RGB摄像头
毫米波雷达

关键发现:毫米波雷达是唯一在所有天气、光照、温度条件下有效工作的传感器

雷达-摄像头 vs LiDAR-摄像头

graph TB
    subgraph 雷达-摄像头融合
        A1[全天候工作] --> A2[长距离测量]
        A2 --> A3[瞬时速度]
        A3 --> A4[低成本量产]
    end
    
    subgraph LiDAR-摄像头融合
        B1[恶劣天气失效] --> B2[中短距离]
        B2 --> B3[无速度信息]
        B3 --> B4[高成本]
    end
    
    style 雷达-摄像头融合 fill:#4a4
    style LiDAR-摄像头融合 fill:#a44

融合架构详解

1. 传感器标定

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
import numpy as np
import cv2
from typing import Tuple, Optional

class RadarCameraCalibration:
"""
雷达-摄像头标定

标定目的:
- 建立雷达坐标系到摄像头坐标系的变换矩阵
- 实现雷达点云到图像平面的投影

标定方法:
1. 基于目标物的标定(Target-Based)
2. 无目标标定(Target-Free)
"""

def __init__(self,
radar_intrinsic: np.ndarray,
camera_intrinsic: np.ndarray,
dist_coeffs: np.ndarray):
"""
初始化标定参数

Args:
radar_intrinsic: 雷达内参矩阵 (3x3)
camera_intrinsic: 摄像头内参矩阵 (3x3)
dist_coeffs: 畸变系数 (5,)
"""
self.radar_intrinsic = radar_intrinsic
self.camera_intrinsic = camera_intrinsic
self.dist_coeffs = dist_coeffs

# 外参:雷达到摄像头的变换
self.rotation_matrix = np.eye(3) # 旋转矩阵
self.translation_vector = np.zeros(3) # 平移向量

def target_based_calibration(self,
radar_points: np.ndarray,
image_corners: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
"""
基于目标物的标定

使用角反射器作为标定目标

Args:
radar_points: 雷达检测到的角反射器位置 (N, 3)
image_corners: 图像中对应的角点位置 (N, 2)

Returns:
rotation_matrix: 旋转矩阵 (3x3)
translation_vector: 平移向量 (3,)
"""
# 使用PnP求解
success, rvec, tvec = cv2.solvePnP(
radar_points,
image_corners,
self.camera_intrinsic,
self.dist_coeffs
)

if success:
self.rotation_matrix, _ = cv2.Rodrigues(rvec)
self.translation_vector = tvec.flatten()

return self.rotation_matrix, self.translation_vector

def project_radar_to_image(self,
radar_point: np.ndarray) -> Optional[Tuple[int, int]]:
"""
将雷达点投影到图像平面

Args:
radar_point: 雷达坐标点 (3,) [x, y, z] in meters

Returns:
image_point: 图像坐标 (u, v) or None if out of view
"""
# 雷达坐标系 -> 摄像头坐标系
camera_point = self.rotation_matrix @ radar_point + self.translation_vector

# 检查是否在摄像头前方
if camera_point[2] <= 0:
return None

# 摄像头坐标系 -> 图像坐标系
image_point_h = self.camera_intrinsic @ camera_point
u = int(image_point_h[0] / image_point_h[2])
v = int(image_point_h[1] / image_point_h[2])

return (u, v)

def target_free_calibration(self,
radar_detections: list,
image_detections: list) -> np.ndarray:
"""
无目标标定

使用场景中的自然物体进行标定

Args:
radar_detections: 雷达检测结果列表
image_detections: 图像检测结果列表

Returns:
transformation_matrix: 变换矩阵 (4x4)
"""
# 提取匹配的目标
matched_pairs = []

for rd in radar_detections:
for id in image_detections:
if self._is_same_object(rd, id):
matched_pairs.append((rd, id))

if len(matched_pairs) < 4:
raise ValueError("需要至少4个匹配点对")

# 构建方程组求解
A = []
b = []

for rd, id in matched_pairs:
# 雷达坐标
rx, ry, rz = rd['position']
# 图像坐标
u, v = id['bbox_center']

A.append([rx, ry, rz, 1, 0, 0, 0, 0, -u*rx, -u*ry, -u*rz, -u])
A.append([0, 0, 0, 0, rx, ry, rz, 1, -v*rx, -v*ry, -v*rz, -v])
b.extend([0, 0])

A = np.array(A)
b = np.array(b)

# 最小二乘求解
_, _, Vt = np.linalg.svd(A)
P = Vt[-1].reshape(3, 4)

return P

def _is_same_object(self, radar_det, image_det) -> bool:
"""判断雷达和图像检测是否为同一目标"""
# 实现匹配逻辑
return True


# 标定测试
if __name__ == "__main__":
# 模拟参数
radar_intrinsic = np.eye(3)
camera_intrinsic = np.array([
[1000, 0, 640],
[0, 1000, 360],
[0, 0, 1]
])
dist_coeffs = np.zeros(5)

calibrator = RadarCameraCalibration(radar_intrinsic, camera_intrinsic, dist_coeffs)

# 测试投影
radar_point = np.array([10.0, 2.0, 0.0]) # 前方10米,右侧2米
image_point = calibrator.project_radar_to_image(radar_point)
print(f"雷达点 {radar_point} -> 图像点 {image_point}")

2. 模态融合表示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
import torch
import torch.nn as nn
import torch.nn.functional as F

class ProjectionBasedFusion(nn.Module):
"""
基于投影的融合表示

将雷达点云投影到图像平面,与视觉特征融合

优点:
- 实现简单
- 计算效率高

缺点:
- 丢失雷达的高度信息
- 稀疏点云投影后信息有限
"""

def __init__(self,
image_channels: int = 3,
radar_channels: int = 5, # x, y, z, v_r, rcs
feature_dim: int = 64):
super().__init__()

# 图像特征提取
self.image_encoder = nn.Sequential(
nn.Conv2d(image_channels, 32, 3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1),
nn.ReLU(),
nn.Conv2d(64, feature_dim, 3, padding=1)
)

# 雷达特征嵌入
self.radar_embedding = nn.Linear(radar_channels, feature_dim)

# 融合层
self.fusion_conv = nn.Conv2d(feature_dim * 2, feature_dim, 1)

def forward(self,
image: torch.Tensor,
radar_points: torch.Tensor,
radar_pixels: torch.Tensor) -> torch.Tensor:
"""
前向传播

Args:
image: 图像 (B, C, H, W)
radar_points: 雷达点云 (B, N, 5) [x, y, z, v_r, rcs]
radar_pixels: 雷达投影像素坐标 (B, N, 2)

Returns:
fused_features: 融合特征 (B, D, H, W)
"""
B, C, H, W = image.shape
N = radar_points.shape[1]

# 图像特征
img_features = self.image_encoder(image) # (B, D, H, W)

# 雷达特征
radar_features = self.radar_embedding(radar_points) # (B, N, D)

# 创建雷达特征图
radar_feature_map = torch.zeros(B, radar_features.shape[-1], H, W,
device=image.device)

# 将雷达特征填充到对应像素
for b in range(B):
for n in range(N):
u, v = radar_pixels[b, n].long()
if 0 <= u < W and 0 <= v < H:
radar_feature_map[b, :, v, u] = radar_features[b, n]

# 拼接融合
combined = torch.cat([img_features, radar_feature_map], dim=1)
fused = self.fusion_conv(combined)

return fused


class BEVBasedFusion(nn.Module):
"""
基于鸟瞰图(BEV)的融合表示

将图像和雷达数据都转换到BEV空间

优点:
- 保留空间关系
- 适合多目标跟踪

缺点:
- 深度估计不准确
- 计算量大
"""

def __init__(self,
image_size: tuple = (720, 1280),
bev_size: tuple = (200, 200),
feature_dim: int = 64):
super().__init__()

self.image_size = image_size
self.bev_size = bev_size

# 图像BEV转换
self.image_to_bev = nn.Sequential(
nn.Conv2d(3, 32, 3, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1),
nn.ReLU(),
)

# 深度估计
self.depth_net = nn.Sequential(
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(),
nn.Conv2d(64, 1, 1) # 深度预测
)

# 雷达BEV编码
self.radar_to_bev = nn.Linear(5, feature_dim) # x, y, z, v_r, rcs

# BEV融合
self.bev_fusion = nn.Sequential(
nn.Conv2d(64 + feature_dim, 128, 3, padding=1),
nn.ReLU(),
nn.Conv2d(128, feature_dim, 3, padding=1)
)

def forward(self,
image: torch.Tensor,
radar_points: torch.Tensor) -> torch.Tensor:
"""
前向传播

Args:
image: 图像 (B, C, H, W)
radar_points: 雷达点云 (B, N, 5)

Returns:
bev_features: BEV特征 (B, D, BH, BW)
"""
B = image.shape[0]
BH, BW = self.bev_size

# 图像特征
img_features = self.image_to_bev(image)

# 深度估计
depth = self.depth_net(img_features)

# 图像 -> BEV (简化的Lift-Splat)
bev_from_image = self._lift_splat(img_features, depth)

# 雷达 -> BEV
radar_features = self.radar_to_bev(radar_points)
bev_from_radar = self._radar_to_bev_map(radar_features, radar_points)

# 融合
combined = torch.cat([bev_from_image, bev_from_radar], dim=1)
bev_features = self.bev_fusion(combined)

return bev_features

def _lift_splat(self, img_features, depth):
"""图像特征提升到BEV空间"""
# 简化实现
B, D, H, W = img_features.shape
BH, BW = self.bev_size

bev = torch.zeros(B, D, BH, BW, device=img_features.device)

# 使用深度信息进行投影
# 实际实现需要相机参数
bev = F.interpolate(bev, size=(BH, BW), mode='bilinear')

return bev

def _radar_to_bev_map(self, features, points):
"""雷达点云转换为BEV特征图"""
B, N, D = features.shape
BH, BW = self.bev_size

bev = torch.zeros(B, D, BH, BW, device=features.device)

# 假设BEV范围:[-50m, 50m] x [-50m, 50m]
range_m = 50.0

for b in range(B):
for n in range(N):
# 雷达坐标 -> BEV像素
x, y = points[b, n, 0], points[b, n, 1]
bev_x = int((x + range_m) / (2 * range_m) * BW)
bev_y = int((y + range_m) / (2 * range_m) * BH)

if 0 <= bev_x < BW and 0 <= bev_y < BH:
bev[b, :, bev_y, bev_x] = features[b, n]

return bev


# 融合测试
if __name__ == "__main__":
# 测试投影融合
proj_fusion = ProjectionBasedFusion()

image = torch.randn(2, 3, 720, 1280)
radar = torch.randn(2, 100, 5)
pixels = torch.rand(2, 100, 2) * 720

features = proj_fusion(image, radar, pixels)
print(f"投影融合输出: {features.shape}")

# 测试BEV融合
bev_fusion = BEVBasedFusion()
bev_features = bev_fusion(image, radar)
print(f"BEV融合输出: {bev_features.shape}")

3. 数据对齐

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
class DataAlignment:
"""
数据对齐

解决雷达和摄像头数据的时间同步和空间对齐

对齐类型:
1. 显式对齐:通过标定参数直接对齐
2. 隐式对齐:通过学习方式对齐
"""

def __init__(self,
time_offset: float = 0.0,
spatial_transform: np.ndarray = None):
"""
Args:
time_offset: 雷达-摄像头时间偏移(秒)
spatial_transform: 空间变换矩阵 (4x4)
"""
self.time_offset = time_offset
self.spatial_transform = spatial_transform or np.eye(4)

def explicit_alignment(self,
radar_data: dict,
camera_data: dict,
timestamp: float) -> tuple:
"""
显式对齐

使用标定参数进行时间同步和空间变换

Args:
radar_data: 雷达数据字典
camera_data: 摄像头数据字典
timestamp: 当前时间戳

Returns:
aligned_radar: 对齐后的雷达数据
aligned_camera: 对齐后的摄像头数据
"""
# 时间同步
radar_time = radar_data['timestamp']
camera_time = camera_data['timestamp']

# 插值对齐
if abs(radar_time - camera_time - self.time_offset) > 0.05: # 50ms容差
# 需要时间插值
radar_points = self._interpolate_radar(
radar_data,
camera_time + self.time_offset
)
else:
radar_points = radar_data['points']

# 空间变换
aligned_points = self._transform_points(radar_points)

return aligned_points, camera_data['image']

def implicit_alignment(self,
radar_features: torch.Tensor,
camera_features: torch.Tensor) -> torch.Tensor:
"""
隐式对齐

使用注意力机制学习对齐

Args:
radar_features: 雷达特征 (B, N, D)
camera_features: 摄像头特征 (B, H*W, D)

Returns:
aligned_features: 对齐后的特征
"""
# 交叉注意力
attention = torch.bmm(radar_features, camera_features.transpose(1, 2))
attention = F.softmax(attention / (radar_features.shape[-1] ** 0.5), dim=-1)

# 加权融合
aligned = torch.bmm(attention, camera_features)

return aligned

def _interpolate_radar(self, radar_data, target_time):
"""雷达数据时间插值"""
# 实现插值逻辑
return radar_data['points']

def _transform_points(self, points):
"""点云空间变换"""
# 齐次坐标
ones = np.ones((points.shape[0], 1))
points_h = np.hstack([points, ones])

# 变换
transformed = (self.spatial_transform @ points_h.T).T

return transformed[:, :3]

融合操作分类

1. 早期融合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class EarlyFusion(nn.Module):
"""
早期融合

在数据层直接融合雷达和摄像头数据

优点:
- 保留原始信息
- 端到端优化

缺点:
- 模态差异大
- 对噪声敏感
"""

def __init__(self,
radar_channels: int = 5,
image_channels: int = 3,
fusion_channels: int = 8):
super().__init__()

# 雷达通道扩展
self.radar_expand = nn.Linear(radar_channels, fusion_channels)

# 融合卷积
self.fusion_conv = nn.Conv2d(
image_channels + fusion_channels,
64, 3, padding=1
)

def forward(self,
image: torch.Tensor,
radar_points: torch.Tensor,
radar_pixels: torch.Tensor) -> torch.Tensor:
"""
Args:
image: (B, 3, H, W)
radar_points: (B, N, 5)
radar_pixels: (B, N, 2)
"""
B, _, H, W = image.shape
N = radar_points.shape[1]

# 雷达特征扩展
radar_features = self.radar_expand(radar_points) # (B, N, 8)

# 创建雷达特征图
radar_map = torch.zeros(B, 8, H, W, device=image.device)
for b in range(B):
for n in range(N):
u, v = radar_pixels[b, n].long()
if 0 <= u < W and 0 <= v < H:
radar_map[b, :, v, u] = radar_features[b, n]

# 拼接
fused = torch.cat([image, radar_map], dim=1)

# 卷积
output = self.fusion_conv(fused)

return output

2. 中期融合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class MiddleFusion(nn.Module):
"""
中期融合

在特征层融合雷达和摄像头特征

优点:
- 特征抽象度高
- 灵活性好

缺点:
- 特征对齐困难
- 计算量较大
"""

def __init__(self, feature_dim: int = 256):
super().__init__()

# 图像特征提取
self.image_backbone = nn.Sequential(
nn.Conv2d(3, 64, 7, stride=2, padding=3),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(),
nn.Conv2d(128, feature_dim, 3, padding=1)
)

# 雷达特征提取
self.radar_backbone = nn.Sequential(
nn.Linear(5, 64),
nn.ReLU(),
nn.Linear(64, 128),
nn.ReLU(),
nn.Linear(128, feature_dim)
)

# 特征融合
self.fusion = nn.MultiheadAttention(
embed_dim=feature_dim,
num_heads=8,
batch_first=True
)

def forward(self,
image: torch.Tensor,
radar_points: torch.Tensor,
radar_pixels: torch.Tensor) -> torch.Tensor:
"""
Args:
image: (B, 3, H, W)
radar_points: (B, N, 5)
radar_pixels: (B, N, 2)
"""
B = image.shape[0]

# 图像特征
img_features = self.image_backbone(image) # (B, D, H', W')
H_f, W_f = img_features.shape[2:]
img_features = img_features.flatten(2).transpose(1, 2) # (B, H'*W', D)

# 雷达特征
radar_features = self.radar_backbone(radar_points) # (B, N, D)

# 交叉注意力融合
fused, _ = self.fusion(
query=img_features,
key=radar_features,
value=radar_features
)

return fused

3. 晚期融合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class LateFusion(nn.Module):
"""
晚期融合

在决策层融合雷达和摄像头检测结果

优点:
- 模态独立
- 易于实现

缺点:
- 丢失跨模态信息
- 融合效果有限
"""

def __init__(self,
num_classes: int = 10,
feature_dim: int = 256):
super().__init__()

# 图像检测器
self.image_detector = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.ReLU(),
nn.AdaptiveAvgPool2d(1),
nn.Flatten(),
nn.Linear(64, feature_dim),
nn.ReLU(),
nn.Linear(feature_dim, num_classes)
)

# 雷达检测器
self.radar_detector = nn.Sequential(
nn.Linear(5, 64),
nn.ReLU(),
nn.Linear(64, feature_dim),
nn.ReLU(),
nn.Linear(feature_dim, num_classes)
)

# 决策融合权重
self.fusion_weights = nn.Parameter(torch.ones(2) / 2)

def forward(self,
image: torch.Tensor,
radar_points: torch.Tensor) -> torch.Tensor:
"""
Args:
image: (B, 3, H, W)
radar_points: (B, N, 5)
"""
# 图像检测
img_logits = self.image_detector(image) # (B, num_classes)

# 雷达检测(聚合点云)
radar_global = radar_points.mean(dim=1) # (B, 5)
radar_logits = self.radar_detector(radar_global) # (B, num_classes)

# 加权融合
weights = F.softmax(self.fusion_weights, dim=0)
fused_logits = weights[0] * img_logits + weights[1] * radar_logits

return fused_logits

数据集总结

数据集 场景 雷达类型 标注 规模
nuScenes 自动驾驶 5个毫米波雷达 3D框 1000场景
Waymo 自动驾驶 1个毫米波雷达 3D框 1150场景
Astyx Automotive 高分辨率雷达 3D框 546帧
RADIal 自动驾驶 HD雷达 点云 25K帧
CRUW 自动驾驶 雷达张量 类别 10K帧
TJI4DRadSet 自动驾驶 4D雷达 3D框 7K帧

IMS应用启示

1. CPD儿童检测融合方案

方案 雷达配置 摄像头配置 检测精度 成本
方案A 60GHz 1TX1RX RGB-IR 85%
方案B 77GHz 2TX4RX RGB 92%
方案C 79GHz 4TX4RX RGB-IR + 深度 97%

2. 乘员分类融合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 乘员分类融合示例
occupant_classification_fusion = {
"雷达输入": {
"RCS": "反射截面",
"多普勒": "呼吸/心跳信号",
"点云": "轮廓信息"
},
"摄像头输入": {
"RGB": "外观特征",
"IR": "夜间可见",
"深度": "空间信息"
},
"融合策略": {
"早期融合": "适合高分辨率雷达",
"中期融合": "适合BEV表示",
"晚期融合": "适合独立检测器"
}
}

3. 技术选型建议

功能需求 推荐融合方式 硬件配置
CPD儿童检测 中期融合 + BEV 60GHz雷达 + IR摄像头
乘员分类 早期融合 77GHz雷达 + RGB摄像头
OOP姿态检测 BEV融合 79GHz 4D雷达 + 深度摄像头
安全带检测 晚期融合 60GHz雷达 + RGB摄像头

开放问题与未来方向

1. 模型鲁棒性

  • 挑战:遮挡、极端天气、传感器故障
  • 方向:自适应融合、鲁棒注意力机制

2. 模态不确定性

  • 挑战:雷达噪声、摄像头误检
  • 方向:不确定性量化、置信度加权融合

3. 缺失模态处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class MissingModalityHandler(nn.Module):
"""缺失模态处理器"""

def __init__(self, feature_dim: int = 256):
super().__init__()

# 模态补全网络
self.radar_completion = nn.Linear(feature_dim, feature_dim)
self.camera_completion = nn.Linear(feature_dim, feature_dim)

def forward(self,
radar_features: torch.Tensor,
camera_features: torch.Tensor,
radar_valid: torch.Tensor,
camera_valid: torch.Tensor) -> tuple:
"""
Args:
radar_features: (B, N, D)
camera_features: (B, M, D)
radar_valid: (B,) bool
camera_valid: (B,) bool
"""
# 雷达缺失时,用摄像头特征补全
if not radar_valid.all():
completed_radar = self.radar_completion(camera_features.mean(dim=1))
else:
completed_radar = radar_features

# 摄像头缺失时,用雷达特征补全
if not camera_valid.all():
completed_camera = self.camera_completion(radar_features.mean(dim=1))
else:
completed_camera = camera_features

return completed_radar, completed_camera

参考资料

  1. Shi, K. et al. “Radar and Camera Fusion for Object Detection and Tracking: A Comprehensive Survey.” arXiv 2024.
  2. nuScenes Dataset: https://www.nuscenes.org/
  3. RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection. CVPR 2024.
  4. CR3DT: Camera-RADAR fusion for 3D detection and tracking. 2024.

本文详细解读雷达-摄像头融合技术,包含完整代码实现与IMS落地指导。


雷达-摄像头融合目标检测与跟踪综述
https://dapalm.com/2026/06/20/2026-06-20-radar-camera-fusion-survey/
作者
Mars
发布于
2026年6月20日
许可协议