14 - 项目4:多模态智能体
约 232 字小于 1 分钟
LangChain
2026-03-08
项目目标
支持图像理解、语音交互的多模态智能体。
功能特性
- ✅ 图像理解
- ✅ 语音输入/输出
- ✅ 多模态融合
核心代码
\\python from langchain_openai import ChatOpenAI import speech_recognition as sr from PIL import Image
图像理解
llm = ChatOpenAI(model="gpt-4-vision-preview")
def analyze_image(image_path): image = Image.open(image_path) response = llm.invoke([ {"type": "text", "text": "描述这张图片"}, {"type": "image_url", "image_url": image_path} ]) return response.content
语音识别
def voice_to_text(): recognizer = sr.Recognizer() with sr.Microphone() as source: audio = recognizer.listen(source) return recognizer.recognize_google(audio, language="zh-CN")
多模态交互
while True: mode = input("选择模式(1:文字 2:语音 3:图像):")
if mode == "1":
text = input("你:")
elif mode == "2":
text = voice_to_text()
elif mode == "3":
image_path = input("图片路径:")
text = analyze_image(image_path)
response = llm.invoke(text)
print(f"AI:{response.content}")
\\
本课小结
- GPT-4 Vision 图像理解
- 语音识别与合成
- 多模态融合
下一课:15 - 部署与优化