
Xiaomi
xiaomi/mimo-v2-omniMiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
23
credits / gen
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Provider
Xiaomi
Type
Chat
Context Window
262,144 tokens
Pricing
23 credits
Vision
Can process and understand images
File Support
Can read PDF, DOCX, XLSX & more
Reasoning
Chain-of-thought reasoning exposed
262K Context
Large context window for long documents
Vision (OR)
OpenRouter reports vision support