雷达-摄像头融合感知：目标检测与跟踪综合综述

论文来源： arXiv 2410.19872v1, 2024
作者： Kun Shi et al. (Nanyang Technological University, Zhejiang University)
核心价值： 系统性梳理雷达-摄像头融合的技术路线，为IMS多模态感知提供指导

传感器特性对比

各传感器能力矩阵

传感器	语义信息	测距能力	角分辨率	速度测量	光照适应	天气鲁棒	烟雾穿透
超声波	❌ 差	△ 一般	❌ 差	△ 一般	✅ 优	✅ 优	△ 一般
红外	△ 一般	△ 一般	✅ 好	❌ 差	△ 一般	❌ 差	❌ 差
LiDAR	△ 一般	✅ 好	✅ 好	❌ 差	✅ 优	△ 一般	❌ 差
RGB摄像头	✅ 优	△ 一般	✅ 好	❌ 差	❌ 差	❌ 差	❌ 差
mmWave雷达	❌ 差	✅ 优	✅ 好	✅ 优	✅ 优	✅ 优	✅ 优

核心洞察：

mmWave雷达是唯一在所有天气、光照、温度条件下都有效工作的传感器
雷达的优势：速度测量、全天候工作
雷达的劣势：数据稀疏、角分辨率低、杂波干扰
摄像头的优势：丰富的语义信息、高分辨率
摄像头的劣势：光照敏感、天气影响

融合架构分类

1. 按融合阶段分类

┌─────────────────────────────────────────────────────────────┐
│                      融合阶段分类                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Early Fusion (早期融合)                                     │
│  ┌─────┐   ┌─────┐                                          │
│  │Camera│   │Radar│                                         │
│  └──┬──┘   └──┬──┘                                          │
│     │         │                                             │
│     └────┬────┘                                             │
│          ↓                                                  │
│    ┌──────────┐                                             │
│    │数据级融合 │                                             │
│    └────┬─────┘                                             │
│         ↓                                                   │
│    ┌──────────┐                                             │
│    │ 特征提取 │                                             │
│    └────┬─────┘                                             │
│         ↓                                                   │
│    ┌──────────┐                                             │
│    │ 检测网络 │                                             │
│    └──────────┘                                             │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Middle Fusion (中期融合)                                    │
│  ┌─────┐   ┌─────┐                                          │
│  │Camera│   │Radar│                                         │
│  └──┬──┘   └──┬──┘                                          │
│     │         │                                             │
│     ↓         ↓                                             │
│  ┌──────┐ ┌──────┐                                          │
│  │特征  │ │特征  │                                          │
│  │提取  │ │提取  │                                          │
│  └──┬───┘ └──┬───┘                                          │
│     │         │                                             │
│     └────┬────┘                                             │
│          ↓                                                  │
│    ┌──────────┐                                             │
│    │特征级融合 │                                             │
│    └────┬─────┘                                             │
│         ↓                                                   │
│    ┌──────────┐                                             │
│    │ 检测头   │                                             │
│    └──────────┘                                             │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Late Fusion (后期融合)                                      │
│  ┌─────┐   ┌─────┐                                          │
│  │Camera│   │Radar│                                         │
│  └──┬──┘   └──┬──┘                                          │
│     │         │                                             │
│     ↓         ↓                                             │
│  ┌──────┐ ┌──────┐                                          │
│  │独立  │ │独立  │                                          │
│  │检测  │ │检测  │                                          │
│  └──┬───┘ └──┬───┘                                          │
│     │         │                                             │
│     └────┬────┘                                             │
│          ↓                                                  │
│    ┌──────────┐                                             │
│    │决策级融合 │                                             │
│    └──────────┘                                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2. 融合表示方法

表示方法	描述	优势	劣势
投影式	雷达点投影到图像平面	简单直接	信息损失
BEV式	统一到鸟瞰图	空间一致性	计算量大
Query式	基于Transformer查询	端到端学习	需要大量数据

核心技术模块

1. 传感器标定

"""
雷达-摄像头标定模块
目标：建立雷达坐标系与摄像头坐标系的映射关系
"""

import numpy as np
from typing import Tuple
from dataclasses import dataclass


@dataclass
class CalibrationResult:
    """标定结果"""
    rotation_matrix: np.ndarray  # 3x3 旋转矩阵
    translation_vector: np.ndarray  # 3x1 平移向量
    camera_matrix: np.ndarray  # 3x3 相机内参
    dist_coeffs: np.ndarray  # 畸变系数
    reprojection_error: float  # 重投影误差


class RadarCameraCalibrator:
    """
    雷达-摄像头标定器
    
    方法：
    1. 基于目标的标定（Target-Based）
    2. 无目标标定（Target-Free）
    """
    
    def __init__(
        self,
        image_width: int = 1920,
        image_height: int = 1080
    ):
        self.image_width = image_width
        self.image_height = image_height
        
    def target_based_calibration(
        self,
        image_points: np.ndarray,
        radar_points: np.ndarray,
        camera_matrix: np.ndarray
    ) -> CalibrationResult:
        """
        基于目标的标定
        
        使用角反射器作为标定目标
        
        Args:
            image_points: 图像坐标, shape=(N, 2)
            radar_points: 雷达坐标, shape=(N, 3), (x, y, z)
            camera_matrix: 相机内参, shape=(3, 3)
            
        Returns:
            result: 标定结果
        """
        # PnP求解
        success, rvec, tvec = cv2.solvePnP(
            radar_points,
            image_points,
            camera_matrix,
            None
        )
        
        if not success:
            raise ValueError("Calibration failed")
            
        # 旋转向量转矩阵
        R, _ = cv2.Rodrigues(rvec)
        
        # 计算重投影误差
        projected, _ = cv2.projectPoints(
            radar_points, rvec, tvec, camera_matrix, None
        )
        projected = projected.reshape(-1, 2)
        error = np.mean(np.linalg.norm(image_points - projected, axis=1))
        
        return CalibrationResult(
            rotation_matrix=R,
            translation_vector=tvec,
            camera_matrix=camera_matrix,
            dist_coeffs=np.zeros(5),
            reprojection_error=error
        )
    
    def project_radar_to_image(
        self,
        radar_point: np.ndarray,
        calibration: CalibrationResult
    ) -> Tuple[int, int]:
        """
        将雷达点投影到图像平面
        
        Args:
            radar_point: 雷达坐标 (x, y, z)
            calibration: 标定结果
            
        Returns:
            image_point: 图像坐标 (u, v)
        """
        # 坐标变换
        R = calibration.rotation_matrix
        t = calibration.translation_vector
        K = calibration.camera_matrix
        
        # 雷达坐标系 -> 相机坐标系
        point_camera = R @ radar_point + t
        
        # 相机坐标系 -> 图像坐标系
        point_image = K @ point_camera
        
        # 归一化
        u = int(point_image[0] / point_image[2])
        v = int(point_image[1] / point_image[2])
        
        return (u, v)
    
    def target_free_calibration(
        self,
        image_sequences: list,
        radar_sequences: list
    ) -> CalibrationResult:
        """
        无目标标定
        
        使用运动恢复结构（SFM）和ICP
        
        Args:
            image_sequences: 图像序列
            radar_sequences: 雷达数据序列
            
        Returns:
            result: 标定结果
        """
        # 这是一个简化的实现框架
        # 实际实现需要：
        # 1. 从图像序列恢复相机位姿（SFM）
        # 2. 从雷达序列估计运动（ICP）
        # 3. 求解坐标系变换
        
        pass


# 测试代码
if __name__ == "__main__":
    import cv2
    
    calibrator = RadarCameraCalibrator()
    
    # 模拟标定数据
    # 假设有4个角反射器
    radar_points = np.array([
        [5.0, 0.0, 1.0],   # 右前
        [5.0, -2.0, 1.0],  # 左前
        [10.0, 0.0, 1.0],  # 右后
        [10.0, -2.0, 1.0]  # 左后
    ], dtype=np.float32)
    
    # 对应的图像坐标
    image_points = np.array([
        [1200, 400],
        [720, 400],
        [1100, 500],
        [820, 500]
    ], dtype=np.float32)
    
    # 相机内参
    camera_matrix = np.array([
        [1000, 0, 960],
        [0, 1000, 540],
        [0, 0, 1]
    ], dtype=np.float32)
    
    # 标定
    result = calibrator.target_based_calibration(
        image_points, radar_points, camera_matrix
    )
    
    print("=== 雷达-摄像头标定测试 ===")
    print(f"旋转矩阵:\n{result.rotation_matrix}")
    print(f"平移向量: {result.translation_vector.T}")
    print(f"重投影误差: {result.reprojection_error:.2f} 像素")

2. 数据对齐

"""
数据对齐模块
解决雷达和摄像头的时间同步与空间对齐问题
"""

import numpy as np
from typing import List, Tuple, Optional
from dataclasses import dataclass


@dataclass
class RadarDetection:
    """雷达检测数据"""
    timestamp: float  # 秒
    range_m: float  # 距离（米）
    azimuth_deg: float  # 方位角（度）
    elevation_deg: float  # 俯仰角（度）
    velocity_mps: float  # 径向速度（米/秒）
    rcs_dbsm: float  # RCS值
    snr_db: float  # 信噪比


@dataclass
class CameraDetection:
    """摄像头检测数据"""
    timestamp: float
    bbox: Tuple[int, int, int, int]  # (x, y, w, h)
    class_id: int
    confidence: float
    features: Optional[np.ndarray] = None


class DataAligner:
    """
    数据对齐器
    
    功能：
    1. 时间对齐（插值/最近邻）
    2. 空间对齐（坐标变换）
    """
    
    def __init__(
        self,
        time_sync_threshold: float = 0.05,  # 50ms
        spatial_threshold: float = 2.0  # 2米
    ):
        self.time_sync_threshold = time_sync_threshold
        self.spatial_threshold = spatial_threshold
        
    def temporal_alignment(
        self,
        radar_detections: List[RadarDetection],
        camera_detections: List[CameraDetection]
    ) -> List[Tuple[RadarDetection, CameraDetection]]:
        """
        时间对齐
        
        Args:
            radar_detections: 雷达检测列表
            camera_detections: 摄像头检测列表
            
        Returns:
            aligned_pairs: 对齐的检测对
        """
        aligned_pairs = []
        
        for radar_det in radar_detections:
            # 找最近时间戳的摄像头检测
            min_dt = float('inf')
            best_match = None
            
            for camera_det in camera_detections:
                dt = abs(radar_det.timestamp - camera_det.timestamp)
                
                if dt < min_dt:
                    min_dt = dt
                    best_match = camera_det
                    
            if min_dt <= self.time_sync_threshold:
                aligned_pairs.append((radar_det, best_match))
                
        return aligned_pairs
    
    def spatial_alignment(
        self,
        radar_detection: RadarDetection,
        camera_detection: CameraDetection,
        calibration: CalibrationResult,
        depth_estimate: Optional[float] = None
    ) -> float:
        """
        空间对齐
        
        计算雷达检测与摄像头检测的空间距离
        
        Args:
            radar_detection: 雷达检测
            camera_detection: 摄像头检测
            calibration: 标定结果
            depth_estimate: 深度估计（米）
            
        Returns:
            distance: 空间距离（米）
        """
        # 雷达坐标（极坐标转笛卡尔）
        range_m = radar_detection.range_m
        azimuth = np.radians(radar_detection.azimuth_deg)
        elevation = np.radians(radar_detection.elevation_deg)
        
        radar_x = range_m * np.cos(elevation) * np.sin(azimuth)
        radar_y = range_m * np.cos(elevation) * np.cos(azimuth)
        radar_z = range_m * np.sin(elevation)
        
        radar_point = np.array([radar_x, radar_y, radar_z])
        
        # 投影到图像
        image_point = project_radar_to_image(radar_point, calibration)
        
        # 摄像头检测框中心
        x, y, w, h = camera_detection.bbox
        camera_center = (x + w // 2, y + h // 2)
        
        # 图像平面距离
        image_distance = np.sqrt(
            (image_point[0] - camera_center[0]) ** 2 +
            (image_point[1] - camera_center[1]) ** 2
        )
        
        return image_distance
    
    def associate_detections(
        self,
        radar_detections: List[RadarDetection],
        camera_detections: List[CameraDetection],
        calibration: CalibrationResult
    ) -> List[Tuple[RadarDetection, CameraDetection, float]]:
        """
        检测关联
        
        使用匈牙利算法进行最优匹配
        
        Args:
            radar_detections: 雷达检测列表
            camera_detections: 摄像头检测列表
            calibration: 标定结果
            
        Returns:
            matches: 匹配结果列表 (radar, camera, distance)
        """
        from scipy.optimize import linear_sum_assignment
        
        # 计算代价矩阵
        n_radar = len(radar_detections)
        n_camera = len(camera_detections)
        
        cost_matrix = np.full((n_radar, n_camera), 1e6)
        
        for i, radar_det in enumerate(radar_detections):
            for j, camera_det in enumerate(camera_detections):
                # 时间差异
                dt = abs(radar_det.timestamp - camera_det.timestamp)
                
                if dt > self.time_sync_threshold:
                    continue
                    
                # 空间距离
                spatial_dist = self.spatial_alignment(
                    radar_det, camera_det, calibration
                )
                
                if spatial_dist < self.spatial_threshold:
                    cost_matrix[i, j] = spatial_dist
                    
        # 匈牙利算法
        row_ind, col_ind = linear_sum_assignment(cost_matrix)
        
        # 构建匹配结果
        matches = []
        for i, j in zip(row_ind, col_ind):
            if cost_matrix[i, j] < 1e6:
                matches.append((
                    radar_detections[i],
                    camera_detections[j],
                    cost_matrix[i, j]
                ))
                
        return matches

3. BEV融合表示

"""
BEV (Bird's Eye View) 融合表示模块
将摄像头和雷达数据统一到鸟瞰图表示
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from typing import Tuple, Optional


class BEVFeatureExtractor(nn.Module):
    """
    BEV特征提取器
    
    将摄像头图像特征转换为BEV表示
    """
    
    def __init__(
        self,
        image_size: Tuple[int, int] = (1080, 1920),
        bev_size: Tuple[int, int] = (200, 200),
        bev_range: Tuple[float, float, float, float] = (-50, 50, -50, 50),
        feature_dim: int = 64
    ):
        """
        Args:
            image_size: 图像尺寸 (H, W)
            bev_size: BEV尺寸 (H, W)
            bev_range: BEV范围 (x_min, x_max, y_min, y_max)
            feature_dim: 特征维度
        """
        super().__init__()
        
        self.image_size = image_size
        self.bev_size = bev_size
        self.bev_range = bev_range
        self.feature_dim = feature_dim
        
        # 图像特征提取
        self.image_encoder = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, feature_dim, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(feature_dim),
            nn.ReLU(inplace=True)
        )
        
        # 深度预测分支
        self.depth_head = nn.Sequential(
            nn.Conv2d(feature_dim, 32, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 1, kernel_size=1)
        )
        
        # BEV投影
        self.bev_projection = nn.Sequential(
            nn.Conv2d(feature_dim + 1, feature_dim, kernel_size=3, padding=1),
            nn.BatchNorm2d(feature_dim),
            nn.ReLU(inplace=True)
        )
        
    def forward(
        self,
        images: torch.Tensor,
        depth_gt: Optional[torch.Tensor] = None
    ) -> torch.Tensor:
        """
        Args:
            images: (batch, 3, H, W)
            depth_gt: (batch, H, W) 深度真值（可选）
            
        Returns:
            bev_features: (batch, feature_dim, bev_H, bev_W)
        """
        batch = images.size(0)
        
        # 图像特征
        image_features = self.image_encoder(images)
        
        # 深度预测
        depth_pred = self.depth_head(image_features).squeeze(1)
        
        # 拼接深度
        combined = torch.cat([
            image_features,
            depth_pred.unsqueeze(1)
        ], dim=1)
        
        # 投影到BEV
        bev_features = self.bev_projection(combined)
        
        # 简单的空间变换
        # 实际实现需要相机内参和外参
        bev_features = F.interpolate(
            bev_features,
            size=self.bev_size,
            mode='bilinear',
            align_corners=False
        )
        
        return bev_features, depth_pred


class RadarBEVEncoder(nn.Module):
    """
    雷达BEV编码器
    
    将雷达点云转换为BEV表示
    """
    
    def __init__(
        self,
        bev_size: Tuple[int, int] = (200, 200),
        bev_range: Tuple[float, float, float, float] = (-50, 50, -50, 50),
        feature_dim: int = 64
    ):
        super().__init__()
        
        self.bev_size = bev_size
        self.bev_range = bev_range
        self.feature_dim = feature_dim
        
        # 点云编码
        self.point_encoder = nn.Sequential(
            nn.Linear(5, 32),  # x, y, z, velocity, rcs
            nn.ReLU(inplace=True),
            nn.Linear(32, feature_dim),
            nn.ReLU(inplace=True)
        )
        
        # BEV卷积
        self.bev_conv = nn.Sequential(
            nn.Conv2d(feature_dim, feature_dim, kernel_size=3, padding=1),
            nn.BatchNorm2d(feature_dim),
            nn.ReLU(inplace=True),
            nn.Conv2d(feature_dim, feature_dim, kernel_size=3, padding=1),
            nn.BatchNorm2d(feature_dim),
            nn.ReLU(inplace=True)
        )
        
    def forward(
        self,
        radar_points: torch.Tensor
    ) -> torch.Tensor:
        """
        Args:
            radar_points: (batch, N, 5) x, y, z, velocity, rcs
            
        Returns:
            bev_features: (batch, feature_dim, bev_H, bev_W)
        """
        batch = radar_points.size(0)
        N = radar_points.size(1)
        
        # 点特征编码
        point_features = self.point_encoder(radar_points)  # (batch, N, feature_dim)
        
        # 初始化BEV网格
        bev_grid = torch.zeros(
            batch, self.feature_dim, *self.bev_size,
            device=radar_points.device
        )
        
        # 坐标转换
        x = radar_points[:, :, 0]
        y = radar_points[:, :, 1]
        
        x_min, x_max, y_min, y_max = self.bev_range
        
        # 归一化到BEV网格
        bev_x = ((x - x_min) / (x_max - x_min) * self.bev_size[1]).long()
        bev_y = ((y - y_min) / (y_max - y_min) * self.bev_size[0]).long()
        
        # 边界检查
        valid_mask = (
            (bev_x >= 0) & (bev_x < self.bev_size[1]) &
            (bev_y >= 0) & (bev_y < self.bev_size[0])
        )
        
        # 散点聚集
        for b in range(batch):
            for i in range(N):
                if valid_mask[b, i]:
                    px, py = bev_x[b, i], bev_y[b, i]
                    bev_grid[b, :, py, px] = point_features[b, i]
                    
        # BEV卷积
        bev_features = self.bev_conv(bev_grid)
        
        return bev_features


class RadarCameraBEVFusion(nn.Module):
    """
    雷达-摄像头BEV融合网络
    """
    
    def __init__(
        self,
        bev_size: Tuple[int, int] = (200, 200),
        feature_dim: int = 64
    ):
        super().__init__()
        
        self.camera_encoder = BEVFeatureExtractor(
            bev_size=bev_size,
            feature_dim=feature_dim
        )
        
        self.radar_encoder = RadarBEVEncoder(
            bev_size=bev_size,
            feature_dim=feature_dim
        )
        
        # 融合模块
        self.fusion = nn.Sequential(
            nn.Conv2d(feature_dim * 2, feature_dim, kernel_size=3, padding=1),
            nn.BatchNorm2d(feature_dim),
            nn.ReLU(inplace=True),
            nn.Conv2d(feature_dim, feature_dim, kernel_size=3, padding=1),
            nn.BatchNorm2d(feature_dim),
            nn.ReLU(inplace=True)
        )
        
        # 检测头
        self.detection_head = nn.Sequential(
            nn.Conv2d(feature_dim, 32, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 6, kernel_size=1)  # x, y, w, h, class, confidence
        )
        
    def forward(
        self,
        images: torch.Tensor,
        radar_points: torch.Tensor
    ) -> Tuple[torch.Tensor, dict]:
        """
        Args:
            images: (batch, 3, H, W)
            radar_points: (batch, N, 5)
            
        Returns:
            detections: (batch, 6, bev_H, bev_W)
            features: 中间特征
        """
        # 摄像头BEV特征
        camera_bev, depth_pred = self.camera_encoder(images)
        
        # 雷达BEV特征
        radar_bev = self.radar_encoder(radar_points)
        
        # 融合
        fused = torch.cat([camera_bev, radar_bev], dim=1)
        fused = self.fusion(fused)
        
        # 检测
        detections = self.detection_head(fused)
        
        return detections, {
            'camera_bev': camera_bev,
            'radar_bev': radar_bev,
            'fused': fused,
            'depth_pred': depth_pred
        }


# 测试代码
if __name__ == "__main__":
    # 模拟数据
    batch_size = 2
    
    # 图像
    images = torch.randn(batch_size, 3, 1080, 1920)
    
    # 雷达点云
    n_points = 100
    radar_points = torch.randn(batch_size, n_points, 5)
    
    # 创建模型
    model = RadarCameraBEVFusion(
        bev_size=(200, 200),
        feature_dim=64
    )
    
    # 前向传播
    detections, features = model(images, radar_points)
    
    print("=== 雷达-摄像头BEV融合测试 ===")
    print(f"图像形状: {images.shape}")
    print(f"雷达点云形状: {radar_points.shape}")
    print(f"检测结果形状: {detections.shape}")
    print(f"参数量: {sum(p.numel() for p in model.parameters()):,}")

应用场景

智能交通

应用	描述	优势
车辆检测	多模态融合检测	全天候、高精度
行人检测	弱势道路使用者保护	夜间、恶劣天气有效
交通监控	车流量统计	无需光照

IMS车内监控

应用	摄像头	雷达	融合优势
CPD	视觉受限	穿透毛毯	全覆盖
OOP	姿态估计	距离测量	精确定位
生命体征	不可行	微动检测	非接触监测

IMS开发启示

1. 传感器选型建议

# IMS传感器配置矩阵
SENSOR_CONFIG = {
    'basic': {
        'sensors': ['IR_camera'],
        'cost': '$50-100',
        'coverage': ['DSM', 'basic_OMS']
    },
    'standard': {
        'sensors': ['IR_camera', 'TOF_depth'],
        'cost': '$150-300',
        'coverage': ['DSM', 'OMS', 'OOP']
    },
    'advanced': {
        'sensors': ['IR_camera', 'TOF_depth', 'mmWave_radar'],
        'cost': '$300-500',
        'coverage': ['DSM', 'OMS', 'OOP', 'CPD', 'vital_signs']
    }
}

2. 融合策略

场景	主传感器	辅助传感器	融合方法
白天DSM	摄像头	-	单模态
夜间DSM	IR摄像头	-	单模态
CPD	mmWave雷达	摄像头	后期融合
OOP	深度摄像头	雷达	BEV融合
生命体征	mmWave雷达	-	单模态

总结

维度	内容
核心价值	系统性梳理雷达-摄像头融合技术路线
关键技术	标定、对齐、表示、融合
融合架构	Early/Middle/Late + Projection/BEV/Query
IMS应用	CPD雷达方案、OOP深度检测、多模态融合
趋势	BEV融合、Transformer架构、端到端学习

发布时间： 2026-04-22
标签： #雷达摄像头融合 #多模态感知 #目标检测 #BEV #IMS

技术研究

#DMS #IMS

雷达-摄像头融合感知：目标检测与跟踪综合综述

https://dapalm.com/2026/04/22/2026-04-22-radar-camera-fusion-survey/

作者

Mars

发布于

2026年4月22日

许可协议

Qualcomm NPU边缘部署：ONNX Runtime + Hexagon DSP优化指南上一篇

安全带错误佩戴检测：Euro NCAP新要求与计算机视觉方案下一篇