前言 2026 年 4 月 7 日,Google 正式发布 LiteRT-LM ——生产级边缘 LLM 推理框架。
核心数据:
指标
数值
内存占用
< 1.5GB(Gemma 4 E2B)
首 Token 延迟
0.3s(Samsung S26 Ultra)
Prefill 速度
3,808 tokens/s
Decode 速度
52 tokens/s
隐私合规
数据不出设备
一、60 秒快速开始 1.1 CLI 命令行(零代码) 1 2 3 4 5 6 7 8 uv tool install litert-lm litert-lm run \ --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \ gemma-4-E2B-it.litertlm \ --prompt="What is the capital of France?"
说明:
模型自动从 Hugging Face 下载
首次运行会缓存硬件优化
后续加载秒级启动
1.2 Python API 1 2 3 4 5 6 7 8 9 10 11 from litert_lm import LiteRTEngine engine = LiteRTEngine("gemma-4-E2B-it.litertlm" ) conversation = engine.create_conversation() response = conversation.send_message("Explain edge AI for vehicles" )print (response)
二、Android Kotlin 完整代码 2.1 Gradle 依赖配置 1 2 3 4 5 6 7 8 dependencies { implementation("com.google.ai.edge.litertlm:litertlm-android:latest.release" ) implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3" ) }
2.2 初始化引擎 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import com.google.ai.edge.litertlm.*class LLMEngine (private val context: Context) { private var engine: Engine? = null suspend fun initialize (modelPath: String ) { val engineConfig = EngineConfig( modelPath = modelPath, backend = Backend.NPU( nativeLibraryDir = context.applicationInfo.nativeLibraryDir ), cacheDir = context.cacheDir.absolutePath ) engine = Engine(engineConfig) engine?.initialize() } fun close () { engine?.close() engine = null } }
2.3 对话管理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 class VoiceAssistant (private val engine: Engine) { suspend fun chat (userInput: String ) : String { val config = ConversationConfig( systemInstruction = Contents.of( "你是车载语音助手。简洁回答,不超过 50 字。" ), samplerConfig = SamplerConfig( topK = 10 , topP = 0.95f , temperature = 0.7f ) ) return engine.createConversation(config).use { conversation -> val response = StringBuilder() conversation.sendMessageAsync(userInput) .collect { chunk -> response.append(chunk.toString()) } response.toString() } } }
2.4 GPU 后备方案 如果 NPU 不可用,自动回退到 GPU:
1 2 3 4 5 6 7 8 9 <application > <uses-native-library android:name ="libOpenCL.so" android:required ="false" /> <uses-native-library android:name ="libvndksupport.so" android:required ="false" /> </application >
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 val backend = when { hasNPU() -> Backend.NPU(context.applicationInfo.nativeLibraryDir) hasGPU() -> Backend.GPU() else -> Backend.CPU() }fun hasNPU () : Boolean { return try { File("/dev/ion" ).exists() || File("/dev/dma_heap" ).exists() } catch (e: Exception) { false } }fun hasGPU () : Boolean { return try { val activityManager = context.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager activityManager.deviceConfigurationInfo.reqGlEsVersion >= 0x30000 } catch (e: Exception) { false } }
3.1 定义自定义工具 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 import com.google.ai.edge.litertlm.Toolimport com.google.ai.edge.litertlm.ToolParamclass VehicleToolSet : ToolSet { @Tool(description = "控制车辆空调" ) fun setAirConditioning ( @ToolParam(description = "目标温度,16-32 摄氏度" ) temperature: Int , @ToolParam(description = "风速档位,1-5" ) fanSpeed: Int = 3 , @ToolParam(description = "是否开启,默认 true" ) on: Boolean = true ) : Map<String, Any> { VehicleHal.setAC(temperature, fanSpeed, on) return mapOf( "success" to true , "temperature" to temperature, "fanSpeed" to fanSpeed ) } @Tool(description = "导航到目的地" ) fun navigateTo ( @ToolParam(description = "目的地名称或地址" ) destination: String , @ToolParam(description = "路线类型:fastest/shortest/highway" ) routeType: String = "fastest" ) : Map<String, Any> { val route = NavigationService.planRoute(destination, routeType) return mapOf( "destination" to destination, "distance_km" to route.distance, "eta_minutes" to route.eta ) } @Tool(description = "查询充电站" ) fun findChargingStations ( @ToolParam(description = "搜索半径,单位公里" ) radius: Double = 5.0 , @ToolParam(description = "充电桩类型:fast/normal" ) chargerType: String = "fast" ) : Map<String, Any> { val stations = ChargingService.search(radius, chargerType) return mapOf( "count" to stations.size, "stations" to stations.map { mapOf("name" to it.name, "distance" to it.distance) } ) } }
3.2 集成到对话 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 val conversation = engine.createConversation( ConversationConfig( systemInstruction = Contents.of( "你是车载助手。使用提供的工具控制车辆功能。" ), tools = listOf( tool(VehicleToolSet()) ), automaticToolCalling = true ) )val response = conversation.sendMessage("打开空调,调到 24 度" )
3.3 手动控制工具调用 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 val conversation = engine.createConversation( ConversationConfig( tools = listOf(tool(VehicleToolSet())), automaticToolCalling = false ) )val response = conversation.sendMessage("导航到最近的充电站" )if (response.toolCalls.isNotEmpty()) { for (toolCall in response.toolCalls) { println("模型请求调用: ${toolCall.name} " ) println("参数: ${toolCall.arguments} " ) val result = executeToolManually(toolCall.name, toolCall.arguments) val toolResponse = Message.tool( Contents.of(Content.ToolResponse(toolCall.name, result)) ) val finalAnswer = conversation.sendMessage(toolResponse) println(finalAnswer.text) } }
四、多模态支持(视觉+音频) 4.1 图像理解 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 val engineConfig = EngineConfig( modelPath = "gemma-3n-E2B-it.litertlm" , backend = Backend.CPU(), visionBackend = Backend.GPU() )val conversation = engine.createConversation()val response = conversation.sendMessage( Contents.of( Content.ImageFile("/sdcard/driver_image.jpg" ), Content.Text("分析这张图片中驾驶员的状态" ) ) )
4.2 音频输入 1 2 3 4 5 6 7 8 9 10 11 12 13 14 val engineConfig = EngineConfig( modelPath = "gemma-3n-E2B-it.litertlm" , backend = Backend.CPU(), audioBackend = Backend.NPU(context.applicationInfo.nativeLibraryDir) )val audioBytes = File("recording.wav" ).readBytes()val response = conversation.sendMessage( Contents.of( Content.AudioBytes(audioBytes), Content.Text("这段语音说了什么?" ) ) )
五、车载场景集成方案 5.1 DMS 疲劳干预 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 class DMSInterventionEngine ( private val context: Context, private val dmsService: DMSService ) { private lateinit var llmEngine: Engine suspend fun initialize () { llmEngine = Engine( EngineConfig( modelPath = "${context.filesDir} /gemma-4-E2B-it.litertlm" , backend = Backend.NPU(context.applicationInfo.nativeLibraryDir) ) ) llmEngine.initialize() } suspend fun onFatigueDetected (level: FatigueLevel ) { val prompt = when (level) { FatigueLevel.MILD -> "生成温和的疲劳提醒" FatigueLevel.MODERATE -> "生成中等强度的疲劳警告" FatigueLevel.SEVERE -> "生成紧急停车建议" } val response = llmEngine.createConversation( ConversationConfig( systemInstruction = Contents.of( "你是驾驶安全助手。根据疲劳程度生成个性化提醒。" + "语气:温和但坚定。长度:不超过 30 字。" ), samplerConfig = SamplerConfig(temperature = 0.8f ) ) ).use { it.sendMessage(prompt) } TTSService.speak(response.text) } }
5.2 完整语音助手架构 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ┌────────────────────────────────────────────────────────────┐ │ 车载语音助手架构 │ ├────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────────────────┐ │ │ │ 麦克风 │───►│ ASR │───►│ 文本 │ │ │ └─────────┘ └─────────┘ └──────────┬──────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────┐ │ │ │ LiteRT-LM Engine │ │ │ │ ┌───────────────────┐ │ │ │ │ │ Gemma 4 E2B │ │ │ │ │ │ QCS8255 NPU │ │ │ │ │ └───────────────────┘ │ │ │ │ ┌───────────────────┐ │ │ │ │ │ Tool Set : │ │ │ │ │ │ - 空调控制 │ │ │ │ │ │ - 导航 │ │ │ │ │ │ - 充电站查询 │ │ │ │ │ │ - 车辆状态 │ │ │ │ │ └───────────────────┘ │ │ │ └───────────┬─────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────┐ ┌─────────┐ ┌─────────────────────┐ │ │ │ 扬声器 │◄───│ TTS │◄───│ 响应文本 │ │ │ └─────────┘ └─────────┘ └─────────────────────┘ │ │ │ └────────────────────────────────────────────────────────────┘
5.3 Qualcomm 8255 部署配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 fun getChipset () : String { return try { File("/proc/device-tree/compatible" ).readText() .split("\u0000" ) .firstOrNull { it.contains("qcs" , ignoreCase = true ) } ?: "unknown" } catch (e: Exception) { "unknown" } }val optimalConfig = when (getChipset()) { "qcs8255" -> EngineConfig( modelPath = modelPath, backend = Backend.NPU(nativeLibraryDir), cacheDir = "/data/local/tmp/litert_cache" ) "qcs8295" -> EngineConfig( modelPath = modelPath, backend = Backend.NPU(nativeLibraryDir), maxCacheSize = 2048 * 1024 * 1024 ) else -> EngineConfig( modelPath = modelPath, backend = Backend.CPU() ) }
六、性能基准 6.1 不同平台测试结果
平台
模型
Prefill
Decode
TTFT
内存
Samsung S26 Ultra
Gemma 4 E2B
3,808 t/s
52 t/s
0.3s
676MB
Pixel 9 Pro
Gemma 4 E2B
2,100 t/s
38 t/s
0.5s
720MB
Raspberry Pi 5
Gemma 3-1B
180 t/s
8 t/s
1.2s
1.1GB
Qualcomm QCS8255
Gemma 4 E2B
~2,500 t/s*
~35 t/s*
~0.4s*
~800MB*
注:QCS8255 为估算值,基于 Snapdragon 8 Gen 2 性能推算
6.2 模型文件大小
模型
参数量
文件大小
量化
Gemma 3-1B-IT
1B
0.6GB
INT4
Gemma 4 E2B
2.5B
0.79GB
INT4
Gemma 4 E4B
4B
1.2GB
INT4
Llama-3.2-3B
3B
0.9GB
INT4
七、模型下载 7.1 Hugging Face 模型库 1 2 3 4 5 6 7 8 wget https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it.litertlm wget https://huggingface.co/google/gemma-3n-E2B-it-litert-lm/resolve/main/gemma-3n-E2B-it-int4.litertlm wget https://huggingface.co/google/functiongemma-270m-it/resolve/main/functiongemma-270m-it.litertlm
7.2 Android 预加载模型 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 class ModelDownloader (private val context: Context) { suspend fun downloadModel (modelUrl: String ) : String { val modelFile = File(context.filesDir, "model.litertlm" ) if (modelFile.exists()) { return modelFile.absolutePath } withContext(Dispatchers.IO) { URL(modelUrl).openStream().use { input -> FileOutputStream(modelFile).use { output -> input.copyTo(output) } } } return modelFile.absolutePath } }
八、总结 对云 API 的优势
维度
云 API (GPT-4)
LiteRT-LM
延迟
200-500ms
< 100ms
成本
$0.03/1K tokens
$0(一次性硬件)
隐私
数据上云
数据不出设备
离线
需要网络
完全离线
合规
需额外审计
天然合规
对 IMS 开发的启示
DMS + LLM 结合 :疲劳检测 → AI 个性化干预
边缘部署可行 :Qualcomm NPU 可流畅运行 Gemma 4
Tool Calling 成熟 :可直接控制车辆功能
隐私合规友好 :数据不出设备,符合 EU AI Act
参考文档:
发布日期: 2026-04-11关键词: LiteRT-LM, Edge AI, 车载语音, LLM推理, Gemma 4, Android Kotlin