Transformer疲劳检测SOTA方案:ViT/Swin架构实现99.15%准确率的实时部署

论文信息

  • 论文标题:Real-time driver drowsiness detection using transformer architectures: a novel deep learning approach
  • 来源期刊:Scientific Reports (Nature子刊)
  • 发表时间:2025年
  • DOI:10.1038/s41598-025-02111-x
  • 研究类型:深度学习算法研究

核心创新

本研究首次系统性地将Vision Transformer (ViT)和Swin Transformer应用于驾驶员疲劳检测任务,在MRL数据集上达到99.15%的准确率,超越传统CNN架构。核心创新点:(1)证明了Transformer的全局注意力机制能够捕获眼部特征的远距离依赖,解决了CNN局部感受野的局限性;(2)提出基于CAM (Class Activation Mapping)的可解释性方案,满足车载系统的信任需求;(3)在NVIDIA Jetson平台实现实时推理,延迟低于25ms。

方法详解

1. 整体架构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────┐
│ 输入预处理层 │
│ 图像尺寸: 224×224 | 归一化: [0,1] | 增强: 翻转/旋转 │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
ViT / Swin Transformer
├─────────────────────────────────────────────────────────┤
ViT: Patch EmbeddingTransformer EncoderMLP Head
Swin: Patch PartitionStage×4Classification Head
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ 分类输出层 │
Open-Eyes / Close-Eyes (二分类)
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│ 疲劳评分系统 │
PERCLOS阈值: 15帧 → 触发警报 │
└─────────────────────────────────────────────────────────┘

2. Vision Transformer (ViT) 架构

2.1 Patch Embedding

将输入图像划分为固定大小的patch:

$$\mathbf{z}0 = [\mathbf{x}{class}; \mathbf{x}_p^1 E; \mathbf{x}_p^2 E; \cdots; \mathbf{x}p^N E] + \mathbf{E}{pos}$$

其中:

  • $\mathbf{x}_p^i \in \mathbb{R}^{P^2 \cdot C}$:第$i$个patch($P=16$, $C=3$)
  • $E \in \mathbb{R}^{(P^2 \cdot C) \times D}$:线性投影矩阵
  • $\mathbf{E}_{pos} \in \mathbb{R}^{(N+1) \times D}$:位置嵌入
  • $N = HW/P^2 = 196$:patch数量

2.2 Transformer Encoder

每层包含多头自注意力(MSA)和MLP:

$$\mathbf{z}’l = \text{MSA}(\text{LN}(\mathbf{z}{l-1})) + \mathbf{z}_{l-1}$$

$$\mathbf{z}_l = \text{MLP}(\text{LN}(\mathbf{z}’_l)) + \mathbf{z}’_l$$

多头注意力计算

$$\text{Attention}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{softmax}\left(\frac{\mathbf{QK}^T}{\sqrt{d_k}}\right)\mathbf{V}$$

2.3 ViT架构图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Input Image (224×224×3)

Patchify (16×16)

Linear Projection

Add Position Embedding

┌───────────────────────────┐
Transformer Encoder×12
├───────────────────────────┤
Layer Norm
│ ↓ │
Multi-Head Attention │ ←── 残差连接
(Heads=12, D=768)
│ ↓ │
Layer Norm
│ ↓ │
MLP (GELU) │ ←── 残差连接
(7683072768)
└───────────────────────────┘

MLP Head
(7682)

Softmax
(Open/Close)

3. Swin Transformer架构

3.1 层次化设计

Swin Transformer采用4-stage层次结构:

Stage 分辨率 维度 层数 头数
1 56×56 96 2 3
2 28×28 192 2 6
3 14×14 384 6 12
4 7×7 768 2 24

3.2 窗口注意力

在局部窗口内计算注意力,降低计算复杂度:

$$\text{Complexity} = O(N) \quad \text{vs.} \quad O(N^2) \text{ (global)}$$

窗口大小:$M = 7$

3.3 Shifted Window Attention

交替使用规则窗口和移位窗口:

1
2
3
4
5
6
7
8
9
10
11
Stage L层:   ┌───┬───┐
AB │ 规则窗口
├───┼───┤
CD
└───┴───┘

Stage L+1层: ┌───┬───┐
BA │ 移位窗口 (shift=M//2)
├───┼───┤
│ D │ C │
└───┴───┘

跨窗口信息交互通过移位实现。

4. 疲劳检测流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
┌─────────────────────────────────────────────────────────┐
│ 实时疲劳检测流程 │
└─────────────────────────────────────────────────────────┘

Step 1: 人脸检测 (Haar Cascade)

Step 2: 眼部ROI提取
├── 左眼 (x1,y1,x2,y2)
└── 右眼 (x1,y1,x2,y2)

Step 3: 眼部图像预处理
├── 尺寸调整 (224×224)
├── 灰度→RGB转换
└── 归一化 [0,1]

Step 4: Transformer推理
├── 左眼状态预测
└── 右眼状态预测

Step 5: 疲劳评分计算
if eyes_closed:
score += 1
else:
score -= 1
score = max(0, score)

Step 6: 告警触发
if score >= 15 frames:
trigger_alarm()

5. 数据增强策略

1
2
3
4
5
6
7
8
9
10
11
12
# 训练时数据增强
augmentation_pipeline = {
'horizontal_flip': 0.5, # 水平翻转
'rotation': 15, # 旋转角度范围
'brightness': [0.8, 1.2], # 亮度调整
'contrast': [0.8, 1.2], # 对比度调整
'shift_scale_rotate': {
'shift_limit': 0.1,
'scale_limit': 0.1,
'rotate_limit': 15
}
}

代码复现

环境配置

1
2
3
4
5
6
7
8
9
# 导入依赖
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from PIL import Image
import cv2
import numpy as np
import timm # PyTorch Image Models

ViT模型实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
class ViTDrowsinessDetector(nn.Module):
"""Vision Transformer疲劳检测器"""

def __init__(self, model_name='vit_base_patch16_224', num_classes=2, pretrained=True):
super().__init__()

# 加载预训练ViT
self.backbone = timm.create_model(
model_name,
pretrained=pretrained,
num_classes=0 # 移除分类头
)

# 获取特征维度
self.feature_dim = self.backbone.num_features

# 自定义分类头
self.classifier = nn.Sequential(
nn.LayerNorm(self.feature_dim),
nn.Linear(self.feature_dim, 512),
nn.GELU(),
nn.Dropout(0.3),
nn.Linear(512, num_classes)
)

# 初始化权重
self._init_weights()

def _init_weights(self):
for m in self.classifier.modules():
if isinstance(m, nn.Linear):
nn.init.trunc_normal_(m.weight, std=0.02)
if m.bias is not None:
nn.init.zeros_(m.bias)

def forward(self, x, return_attention=False):
"""
Args:
x: (B, 3, 224, 224) 输入图像
return_attention: 是否返回注意力图
"""
# 提取特征
features = self.backbone(x) # (B, 768)

# 分类
logits = self.classifier(features)

if return_attention:
# 获取注意力权重 (用于可解释性)
attention = self._get_attention_map(x)
return logits, attention

return logits

def _get_attention_map(self, x):
"""提取注意力热力图 (CAM)"""
with torch.no_grad():
# 获取最后一层的注意力
# ViT的注意力存储在backbone.blocks[-1].attn
attention_weights = []

hooks = []
def hook_fn(module, input, output):
attention_weights.append(output[1]) # attention weights

# 注册hook
for block in self.backbone.blocks:
hooks.append(block.attn.register_forward_hook(hook_fn))

# 前向传播
_ = self.backbone(x)

# 移除hooks
for h in hooks:
h.remove()

# 处理注意力图
# 取最后一个block的注意力
attn = attention_weights[-1] # (B, heads, N+1, N+1)

# 取CLS token对所有patch的注意力
attn = attn[:, :, 0, 1:].mean(dim=1) # (B, N)

# 重塑为2D
attn = attn.reshape(attn.size(0), 14, 14)

# 上采样到输入尺寸
attn = F.interpolate(
attn.unsqueeze(1),
size=(224, 224),
mode='bilinear',
align_corners=False
)

return attn.squeeze(1)


class SwinDrowsinessDetector(nn.Module):
"""Swin Transformer疲劳检测器"""

def __init__(self, model_name='swin_tiny_patch4_window7_224', num_classes=2, pretrained=True):
super().__init__()

# 加载预训练Swin
self.backbone = timm.create_model(
model_name,
pretrained=pretrained,
num_classes=0
)

self.feature_dim = self.backbone.num_features

self.classifier = nn.Sequential(
nn.LayerNorm(self.feature_dim),
nn.Linear(self.feature_dim, 256),
nn.GELU(),
nn.Dropout(0.2),
nn.Linear(256, num_classes)
)

def forward(self, x):
features = self.backbone(x)
return self.classifier(features)

实时检测系统

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
class RealTimeDrowsinessSystem:
"""实时疲劳检测系统"""

def __init__(self, model_path, device='cuda'):
self.device = device

# 加载模型
self.model = ViTDrowsinessDetector(
model_name='vit_base_patch16_224',
num_classes=2,
pretrained=False
)
self.model.load_state_dict(torch.load(model_path, map_location=device))
self.model.to(device)
self.model.eval()

# 加载人脸和眼睛检测器
self.face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
self.eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml'
)

# 预处理
self.transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])

# 疲劳评分
self.drowsiness_score = 0
self.score_threshold = 15
self.history = []

def detect_eyes(self, frame):
"""检测人脸和眼睛区域"""
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# 检测人脸
faces = self.face_cascade.detectMultiScale(gray, 1.3, 5)

eye_regions = []
for (x, y, w, h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]

# 检测眼睛
eyes = self.eye_cascade.detectMultiScale(roi_gray)

for (ex, ey, ew, eh) in eyes:
eye_img = roi_color[ey:ey+eh, ex:ex+ew]
eye_regions.append({
'image': eye_img,
'bbox': (x+ex, y+ey, ew, eh)
})

return eye_regions, faces

def predict_eye_state(self, eye_img):
"""预测眼睛状态"""
try:
# 转换为PIL图像
eye_pil = Image.fromarray(cv2.cvtColor(eye_img, cv2.COLOR_BGR2RGB))

# 预处理
eye_tensor = self.transform(eye_pil).unsqueeze(0).to(self.device)

# 推理
with torch.no_grad():
output = self.model(eye_tensor)
prob = F.softmax(output, dim=1)
pred = torch.argmax(prob, dim=1).item()

# pred: 0=Close, 1=Open
return pred == 0 # 返回True表示闭眼

except Exception as e:
print(f"预测错误: {e}")
return False

def update_score(self, eyes_closed):
"""更新疲劳评分"""
if eyes_closed:
self.drowsiness_score += 1
else:
self.drowsiness_score = max(0, self.drowsiness_score - 1)

self.history.append(self.drowsiness_score)

# 保留最近100帧历史
if len(self.history) > 100:
self.history.pop(0)

def check_drowsiness(self):
"""检查是否疲劳"""
return self.drowsiness_score >= self.score_threshold

def run(self, video_source=0):
"""主循环"""
cap = cv2.VideoCapture(video_source)

while True:
ret, frame = cap.read()
if not ret:
break

# 检测眼睛
eye_regions, faces = self.detect_eyes(frame)

# 绘制人脸框
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)

# 预测每只眼睛的状态
all_closed = True
for eye in eye_regions:
is_closed = self.predict_eye_state(eye['image'])

# 绘制眼睛框
ex, ey, ew, eh = eye['bbox']
color = (0, 0, 255) if is_closed else (0, 255, 0)
cv2.rectangle(frame, (ex, ey), (ex+ew, ey+eh), color, 2)

if not is_closed:
all_closed = False

# 更新评分
if len(eye_regions) > 0:
self.update_score(all_closed)

# 显示状态
cv2.putText(frame, f'Score: {self.drowsiness_score}', (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# 检查疲劳
if self.check_drowsiness():
cv2.putText(frame, 'DROWSY ALERT!', (10, 70),
cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 3)
# 触发声音警报
self._trigger_alarm()

cv2.imshow('Drowsiness Detection', frame)

if cv2.waitKey(1) & 0xFF == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

def _trigger_alarm(self):
"""触发警报"""
# 可以集成声音或振动警报
print("ALERT: Drowsy driver detected!")


# 训练脚本
def train_vit_drowsiness(train_dir, val_dir, epochs=30, batch_size=32):
"""训练ViT疲劳检测模型"""

# 数据加载
train_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

val_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

from torchvision.datasets import ImageFolder
train_dataset = ImageFolder(train_dir, transform=train_transform)
val_dataset = ImageFolder(val_dir, transform=val_transform)

train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=batch_size, shuffle=True, num_workers=4
)
val_loader = torch.utils.data.DataLoader(
val_dataset, batch_size=batch_size, shuffle=False, num_workers=4
)

# 模型初始化
model = ViTDrowsinessDetector(
model_name='vit_base_patch16_224',
num_classes=2,
pretrained=True
)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# 损失和优化器
criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
optimizer = torch.optim.AdamW([
{'params': model.backbone.parameters(), 'lr': 1e-5},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], weight_decay=1e-4)

scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(
optimizer, T_0=10, T_mult=2
)

# 训练循环
best_acc = 0
for epoch in range(epochs):
model.train()
train_loss = 0
correct = 0
total = 0

for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)

optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

train_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()

# 验证
model.eval()
val_correct = 0
val_total = 0
with torch.no_grad():
for images, labels in val_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = outputs.max(1)
val_total += labels.size(0)
val_correct += predicted.eq(labels).sum().item()

train_acc = correct / total
val_acc = val_correct / val_total

if val_acc > best_acc:
best_acc = val_acc
torch.save(model.state_dict(), 'best_vit_drowsiness.pth')

print(f'Epoch {epoch+1}/{epochs}: '
f'Train Loss={train_loss/len(train_loader):.4f}, '
f'Train Acc={train_acc:.4f}, '
f'Val Acc={val_acc:.4f}')

scheduler.step()

return model

实验结果

1. 数据集统计

数据集 总样本 Open-Eyes Close-Eyes 分辨率 环境
MRL Eye 84,898 42,952 41,946 多种 多光照
NTHU-DDD 66,521 30,491 36,030 640×480 日/夜
CEW 27,200 - - 多种 Wild

2. 模型性能对比

模型 架构 参数量 MRL准确率 NTHU准确率 CEW准确率 平均
VGG19 CNN 143M 98.7% 96.5% 94.2% 96.5%
ResNet50V2 CNN 25.6M 97.3% 95.8% 93.7% 95.6%
DenseNet169 CNN 14.1M 96.8% 94.2% 92.1% 94.4%
MobileNetV3 CNN 5.4M 94.5% 92.3% 89.6% 92.1%
ViT-Base Transformer 86M 99.15% 98.2% 96.8% 98.0%
Swin-Tiny Transformer 28M 98.9% 97.8% 95.9% 97.5%

3. 关键指标详细对比

模型 Accuracy Precision Recall F1-Score AUC
VGG19 98.7% 98.5% 98.9% 98.7% 0.997
ViT-Base 99.15% 99.1% 99.2% 99.1% 0.999
Swin-Tiny 98.9% 98.7% 99.1% 98.9% 0.998

4. 光照鲁棒性测试

光照条件 VGG19准确率 ViT准确率 Swin准确率
正常光照 99.2% 99.5% 99.3%
低光照 92.3% 96.8% 95.2%
强光 94.1% 97.2% 96.5%
背光 89.7% 94.5% 93.1%
平均 93.8% 97.0% 96.0%

5. 边缘设备部署性能

平台 模型 推理延迟 帧率 内存占用 功耗
Jetson Nano ViT-Tiny 45ms 22fps 850MB 5W
Jetson AGX Orin ViT-Base 18ms 55fps 2.1GB 12W
Jetson AGX Orin Swin-Tiny 12ms 83fps 1.8GB 10W
Qualcomm 8255 Swin-Tiny 15ms 66fps 1.5GB 6W

IMS应用启示

1. Transformer架构成为DMS新标准

相比CNN的优势

特性 CNN Transformer IMS影响
全局依赖 受限(局部感受野) ✅ 全局注意力 检测精度提升
迁移学习 需大量微调 ✅ 预训练有效 数据需求降低
可解释性 需额外设计 ✅ 原生注意力图 满足功能安全要求
计算开销 较低 较高 需优化部署

IMS落地建议

  1. 高端车型采用ViT-Base/Swin-Base,追求最高准确率
  2. 中端车型采用Swin-Tiny/ViT-Tiny,平衡性能和成本
  3. 入门车型采用MobileNet+轻量注意力模块

2. Euro NCAP 2026合规策略

Euro NCAP要求 传统CNN方案 Transformer方案 差距
分心检测准确率>95% 92-94% 97-99% +5%
疲劳检测准确率>90% 88-91% 94-98% +6%
低光照性能>85% 78-82% 92-96% +13%
推理延迟<50ms 15-30ms 12-45ms 相当

3. 功能安全与可解释性

CAM注意力图应用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 生成可解释性报告
def generate_explanation(attention_map, prediction):
"""生成检测结果的解释"""
if prediction == 'closed':
# 提取高注意力区域
high_attn = attention_map > 0.7
coverage = high_attn.sum() / high_attn.numel()

explanation = {
'prediction': '眼睛闭合',
'confidence': attention_map.max().item(),
'attention_coverage': coverage.item(),
'reason': '检测到眼睑区域闭合,瞳孔不可见'
}
return explanation

ISO 26262合规

  1. 注意力图提供决策依据,满足可追溯性要求
  2. 集成置信度评估,低置信度触发降级模式
  3. 双通道冗余设计:ViT + Swin并行推理

4. 实时部署优化策略

量化与剪枝

1
2
3
4
5
6
7
8
9
10
11
12
13
# 动态量化示例
def quantize_model(model):
model.eval()
quantized = torch.quantization.quantize_dynamic(
model,
{nn.Linear, nn.LayerNorm},
dtype=torch.qint8
)
return quantized

# 效果对比
# FP32: 18ms, 2.1GB
# INT8: 8ms, 1.2GB (延迟降低55%, 内存减少43%)

部署优化建议

优化技术 延迟降低 精度损失 适用平台
FP16量化 30-40% <0.1% 所有GPU
INT8量化 50-60% 0.3-0.5% 支持INT8的NPU
知识蒸馏 - <0.5% 所有平台
模型剪枝 20-30% 0.5-1% 所有平台

5. 多任务扩展能力

Transformer架构易于扩展到多任务学习:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class MultiTaskDMS(nn.Module):
"""多任务DMS模型"""

def __init__(self):
self.backbone = timm.create_model('swin_base_patch4_window7_224',
num_classes=0)

# 多个任务头
self.eye_state_head = nn.Linear(1024, 2) # 开/闭眼
self.gaze_head = nn.Linear(1024, 9) # 9方向注视
self.blink_head = nn.Linear(1024, 3) # 正常/快/慢眨眼
self.drowsiness_head = nn.Linear(1024, 4) # KSS 0-3级

def forward(self, x):
features = self.backbone(x)
return {
'eye_state': self.eye_state_head(features),
'gaze': self.gaze_head(features),
'blink': self.blink_head(features),
'drowsiness': self.drowsiness_head(features)
}

优势

  • 单模型完成多个DMS功能,降低系统复杂度
  • 特征共享,提高综合性能
  • 满足Euro NCAP 2026的多维度检测要求

参考文献

  1. Scientific Reports (2025). Real-time driver drowsiness detection using transformer architectures. DOI: 10.1038/s41598-025-02111-x

  2. Dosovitskiy et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021.

  3. Liu et al. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. ICCV 2021.

  4. Euro NCAP (2026). Assessment Protocol - Safe Driving v1.0.

  5. MRL Eye Dataset (2018). Machine Learning Research Lab.

  6. NTHU-DDD Dataset. National Tsing Hua University Driver Drowsiness Detection.


Transformer疲劳检测SOTA方案:ViT/Swin架构实现99.15%准确率的实时部署
https://dapalm.com/2026/06/21/2026-06-21-transformer-drowsiness-detection/
作者
Mars
发布于
2026年6月21日
许可协议