Llama 3.2 11B Vision Instruct

Chat

About this model

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Technical Specifications

Provider

Capabilities

Vision

Can process and understand images

File Support

Can read PDF, DOCX, XLSX & more

131K Context

Large context window for long documents

Vision (OR)

OpenRouter reports vision support