认知分心检测突破:Gaze-READ无监督眼动异常检测框架

一、认知分心检测难点

Euro NCAP 2026将认知分心列为重点检测项,但技术挑战巨大:

分心类型 检测方法 难度 状态
视觉分心 视线偏离检测 ⭐⭐ ✅ 成熟
物理分心 手机/物体检测 ⭐⭐ ✅ 成熟
认知分心 思维游离检测 ⭐⭐⭐⭐⭐ ❌ 待突破

认知分心特征:

  • 眼睛看路但心不在焉
  • 无明显物理行为
  • 需要从眼动模式推断

传统方法局限:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 传统阈值法(效果有限)
def detect_cognitive_distraction_traditional(gaze_points):
"""
传统方法:基于固定规则
"""
# 1. 视线分散度
dispersion = np.std(gaze_points, axis=0)

# 2. 固定点持续时间
fixation_duration = calculate_fixation_duration(gaze_points)

# 3. 扫视频率
saccade_count = count_saccades(gaze_points)

# 固定阈值判断(问题所在)
if dispersion > THRESHOLD_DISPERSION:
return "cognitive_distraction"

# 无法适应个体差异和场景变化

二、Gaze-READ框架创新

2.1 核心思想

论文来源: ScienceDirect 2025年5月发表

创新点: 使用无监督异常检测代替监督分类

1
2
3
4
5
6
7
8
传统思路:收集大量标注数据 → 训练分类器 → 区分分心/正常

标注成本高、场景受限

Gaze-READ思路:
1. 建立"正常"眼动行为模式(无需标注)
2. 检测偏离"正常"的异常行为
3. 异常 = 潜在认知分心

2.2 技术架构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
┌─────────────────────────────────────────────────┐
│ Gaze-READ 框架 │
├─────────────────────────────────────────────────┤
│ │
│ 输入层: 眼动时序数据 │
│ ├─ 凝视点坐标 (x, y) │
│ ├─ 瞳孔直径 │
│ ├─ 眨眼频率 │
│ └─ 注视持续时间 │
│ │
│ 编码层: MOMENT 基础模型 │
│ ├─ 预训练时序编码器 │
│ ├─ 无需针对驾驶场景fine-tune │
│ └─ 提取时序特征嵌入 │
│ │
│ 表征层: 正常行为建模 │
│ ├─ 控制组(正常驾驶)眼动数据 │
│ ├─ 聚类/密度估计 │
│ └─ 建立"正常"分布边界 │
│ │
│ 检测层: 异常检测 │
│ ├─ 实时眼动特征提取 │
│ ├─ 与"正常"分布比较 │
│ └─ 偏离程度 > 阈值 → 异常告警 │
│ │
└─────────────────────────────────────────────────┘

三、核心技术实现

3.1 MOMENT基础模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import torch
import torch.nn as nn
from transformers import AutoModel

class MOMENTEncoder(nn.Module):
"""
MOMENT: 时序基础模型
论文: "MOMENT: A Family of Open Time-series Foundation Models"
"""
def __init__(self, model_name="AutonLab/MOMENT-1-large"):
super(MOMENTEncoder, self).__init__()

# 加载预训练模型
self.moment = AutoModel.from_pretrained(model_name)

# 时序编码维度
self.embed_dim = 1024 # large版本

def forward(self, gaze_sequence):
"""
编码眼动时序

Args:
gaze_sequence: (batch, seq_len, features)
features = [x, y, pupil_diameter, blink_rate, ...]

Returns:
embedding: (batch, embed_dim) 时序特征嵌入
"""
# 归一化
gaze_normalized = (gaze_sequence - gaze_sequence.mean(dim=1, keepdim=True)) / \
(gaze_sequence.std(dim=1, keepdim=True) + 1e-6)

# MOMENT编码
outputs = self.moment(inputs_embeds=gaze_normalized)

# 取[CLS] token作为整体嵌入
embedding = outputs.last_hidden_state[:, 0, :]

return embedding


class GazeREAD(nn.Module):
"""
Gaze-READ: Gaze Representative Embedding and Anomaly Detection
"""
def __init__(self, embed_dim=1024):
super(GazeREAD, self).__init__()

self.encoder = MOMENTEncoder()
self.embed_dim = embed_dim

def build_normal_representation(self, normal_gaze_data):
"""
建立正常眼动行为表征

Args:
normal_gaze_data: 正常驾驶场景的眼动数据 (N, seq_len, features)
"""
# 编码所有正常样本
with torch.no_grad():
embeddings = self.encoder(normal_gaze_data)

# 计算正常分布的统计量
self.normal_mean = embeddings.mean(dim=0)
self.normal_cov = torch.cov(embeddings.T)

# 或使用核密度估计(KDE)
self.normal_embeddings = embeddings

def compute_anomaly_score(self, gaze_sequence):
"""
计算异常分数

Args:
gaze_sequence: 待检测眼动序列 (1, seq_len, features)

Returns:
anomaly_score: 异常分数(越高越异常)
"""
# 编码
embedding = self.encoder(gaze_sequence)

# 方法1: 马氏距离
delta = embedding - self.normal_mean
cov_inv = torch.inverse(self.normal_cov)
mahalanobis_dist = torch.sqrt(delta @ cov_inv @ delta.T)

# 方法2: 最近邻距离
# distances = torch.norm(self.normal_embeddings - embedding, dim=1)
# knn_dist = distances.topk(k=5, largest=False).values.mean()

return mahalanobis_dist.item()

def detect_anomaly(self, gaze_sequence, threshold=3.0):
"""
检测异常(认知分心)

Args:
threshold: 异常阈值(马氏距离)

Returns:
is_anomaly: bool
anomaly_score: float
"""
score = self.compute_anomaly_score(gaze_sequence)

return score > threshold, score

3.2 眼动特征提取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import numpy as np
from scipy import signal

class GazeFeatureExtractor:
"""
眼动特征提取器
"""
def __init__(self, fs=30):
"""
Args:
fs: 采样率(Hz),眼动仪帧率
"""
self.fs = fs

def extract_features(self, gaze_data):
"""
提取眼动特征

Args:
gaze_data: (N, 2) 或 (N, 3)
[x, y] 或 [x, y, timestamp]

Returns:
features: (seq_len, feature_dim)
"""
x = gaze_data[:, 0]
y = gaze_data[:, 1]

features = []

# 1. 凝视点坐标
features.append(x)
features.append(y)

# 2. 速度
vx = np.gradient(x) * self.fs
vy = np.gradient(y) * self.fs
velocity = np.sqrt(vx**2 + vy**2)
features.append(velocity)

# 3. 加速度
ax = np.gradient(vx) * self.fs
ay = np.gradient(vy) * self.fs
acceleration = np.sqrt(ax**2 + ay**2)
features.append(acceleration)

# 4. 凝视分散度(滑动窗口)
window_size = int(self.fs * 2) # 2秒窗口
dispersion = self._calculate_dispersion(x, y, window_size)
features.append(dispersion)

# 5. 固定点检测
fixations = self._detect_fixations(velocity, threshold=30) # 30°/s
fixation_ratio = self._calculate_fixation_ratio(fixations, window_size)
features.append(fixation_ratio)

# 6. 扫视检测
saccades = self._detect_saccades(velocity, threshold=100) # 100°/s
saccade_rate = self._calculate_saccade_rate(saccades, window_size)
features.append(saccade_rate)

return np.column_stack(features)

def _calculate_dispersion(self, x, y, window_size):
"""计算凝视分散度"""
dispersion = np.zeros(len(x))
for i in range(window_size, len(x)):
window_x = x[i-window_size:i]
window_y = y[i-window_size:i]
dispersion[i] = np.std(window_x) + np.std(window_y)
return dispersion

def _detect_fixations(self, velocity, threshold=30):
"""检测注视点"""
return velocity < threshold

def _calculate_fixation_ratio(self, fixations, window_size):
"""计算注视比例"""
ratio = np.zeros(len(fixations))
for i in range(window_size, len(fixations)):
ratio[i] = np.mean(fixations[i-window_size:i])
return ratio

def _detect_saccades(self, velocity, threshold=100):
"""检测扫视"""
return velocity > threshold

def _calculate_saccade_rate(self, saccades, window_size):
"""计算扫视频率"""
rate = np.zeros(len(saccades))
for i in range(window_size, len(saccades)):
rate[i] = np.sum(saccades[i-window_size:i]) / (window_size / self.fs)
return rate

3.3 完整检测流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
class CognitiveDistractionDetector:
"""
认知分心检测器(基于Gaze-READ)
"""
def __init__(self, model_path=None):
self.feature_extractor = GazeFeatureExtractor(fs=30)
self.gaze_read = GazeREAD()

if model_path:
self.load_normal_model(model_path)

def calibrate(self, normal_driving_gaze_data):
"""
校准:收集正常驾驶数据建立基线

Args:
normal_driving_gaze_data: 正常驾驶时的眼动数据
list of (N, 2) arrays
"""
# 提取特征
feature_sequences = []
for gaze_data in normal_driving_gaze_data:
features = self.feature_extractor.extract_features(gaze_data)
feature_sequences.append(features)

# 转换为tensor
# 需要padding或截断到统一长度
max_len = max(len(seq) for seq in feature_sequences)
padded_sequences = np.zeros((len(feature_sequences), max_len, feature_sequences[0].shape[1]))
for i, seq in enumerate(feature_sequences):
padded_sequences[i, :len(seq)] = seq

gaze_tensor = torch.tensor(padded_sequences, dtype=torch.float32)

# 建立正常表征
self.gaze_read.build_normal_representation(gaze_tensor)

print(f"校准完成:基于 {len(feature_sequences)} 个正常驾驶样本")

def detect(self, realtime_gaze_data):
"""
实时检测认知分心

Args:
realtime_gaze_data: (N, 2) 实时眼动数据
N建议为60-90帧(2-3秒)

Returns:
is_distracted: bool
anomaly_score: float
"""
# 提取特征
features = self.feature_extractor.extract_features(realtime_gaze_data)

# 转换为tensor
gaze_tensor = torch.tensor(features, dtype=torch.float32).unsqueeze(0)

# 检测异常
is_anomaly, score = self.gaze_read.detect_anomaly(gaze_tensor, threshold=3.0)

return is_anomaly, score

def save_model(self, path):
"""保存模型"""
torch.save({
'normal_mean': self.gaze_read.normal_mean,
'normal_cov': self.gaze_read.normal_cov,
}, path)

def load_model(self, path):
"""加载模型"""
checkpoint = torch.load(path)
self.gaze_read.normal_mean = checkpoint['normal_mean']
self.gaze_read.normal_cov = checkpoint['normal_cov']


# 使用示例
if __name__ == "__main__":
# 初始化检测器
detector = CognitiveDistractionDetector()

# 校准阶段(正常驾驶数据收集)
# normal_data = collect_normal_driving_gaze() # 需要收集足够样本
# detector.calibrate(normal_data)

# 实时检测
# while driving:
# gaze_data = eye_tracker.get_gaze_points(duration=2.0) # 2秒窗口
# is_distracted, score = detector.detect(gaze_data)
#
# if is_distracted:
# trigger_warning("检测到认知分心")

四、实验验证

4.1 数据集

论文使用的数据集:

数据集 场景 样本数
控制组 正常听课/驾驶 30人
实验组 有干扰物(手机/对话) 30人
时长 每人 10-15分钟

4.2 检测性能

指标 说明
AUC 0.89 ROC曲线下面积
准确率 82.3% 二分类
误报率 15.2% 正常误判为分心
漏检率 20.3% 分心未检出

与传统方法对比:

方法 AUC 准确率 需要标注
固定阈值 0.72 68%
SVM监督学习 0.85 78%
Gaze-READ 0.89 82%

五、IMS开发启示

5.1 技术路线建议

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 分阶段实施路线
class CognitiveDistractionRoadmap:
def __init__(self):
self.phases = {
"Phase 1 (2026)": {
"方法": "固定阈值法 + 规则",
"场景": "极端认知分心(长时间发呆)",
"目标": "满足Euro NCAP基本要求",
},
"Phase 2 (2027)": {
"方法": "Gaze-READ无监督",
"场景": "中等认知分心",
"目标": "提升检测精度",
},
"Phase 3 (2028+)": {
"方法": "多模态融合(眼动+生理+车辆)",
"场景": "全场景认知分心",
"目标": "高精度量产方案",
},
}

5.2 系统集成要点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 与现有DMS集成
class DMSWithCognitiveDistraction:
def __init__(self):
self.fatigue_detector = FatigueDetector()
self.visual_distraction_detector = VisualDistractionDetector()
self.cognitive_detector = CognitiveDistractionDetector()

def process_frame(self, frame, eye_tracker_data):
# 1. 疲劳检测(最高优先级)
fatigue_alert = self.fatigue_detector.detect(frame)
if fatigue_alert:
return Alert("FATIGUE", severity=2)

# 2. 视觉分心检测
visual_alert = self.visual_distraction_detector.detect(eye_tracker_data)
if visual_alert:
return Alert("VISUAL_DISTRACTION", severity=1)

# 3. 认知分心检测(背景任务)
gaze_sequence = eye_tracker_data.get_sequence(duration=3.0)
is_cognitive, score = self.cognitive_detector.detect(gaze_sequence)

if is_cognitive and score > 4.0: # 高置信度
return Alert("COGNITIVE_DISTRACTION", severity=1, score=score)

return Status("NORMAL")

5.3 部署挑战

挑战 解决方案
个体差异 驾驶开始时校准(1-2分钟)
场景变化 多场景正常数据收集
计算资源 MOMENT模型量化/剪枝
眼动仪精度 硬件选型(≥60Hz)

六、总结

Gaze-READ框架为认知分心检测提供了新思路:

核心优势:

  1. 无监督学习:无需大量标注数据
  2. 基础模型:利用预训练时序知识
  3. 个体适应:基于个人基线检测
  4. 实时性:轻量级异常检测

待改进:

  1. 校准流程简化
  2. 极端场景鲁棒性
  3. 多模态融合扩展

Euro NCAP 2026建议:

  • 认知分心暂无明确量化指标
  • 可作为加分项技术储备
  • 2027年后可能成为强制项

参考论文:

  • “A foundation model-based framework for unsupervised gaze anomaly detection”, ScienceDirect 2025
  • “MOMENT: A Family of Open Time-series Foundation Models”
  • Euro NCAP 2026 Assessment Protocol

相关文章: