Home: Motoring > Xiaomi Open-Sources XiaomiOneVL Framework, Unifying VLA and World Models for Breakthroughs in Autonomous Driving Speed and Accuracy

Xiaomi Open-Sources XiaomiOneVL Framework, Unifying VLA and World Models for Breakthroughs in Autonomous Driving Speed and Accuracy

From:Internet Info Agency 2026-05-13 17:34:00

On May 13, Xiaomi officially launched and open-sourced XiaomiOneVL, a one-step latent-space language-vision reasoning framework. This framework unifies multiple technical approaches—including Vision-Language-Action (VLA), world models, and latent-space reasoning—within a single architecture for the first time, achieving performance improvements in perception, reasoning, and planning tasks for autonomous driving. XiaomiOneVL attains state-of-the-art (SOTA) results on three major benchmarks: ROADWork, Impromptu, and Alpamayo-R1, and demonstrates strong performance on the NAVSIM benchmark. Its reasoning accuracy surpasses explicit Chain-of-Thought (CoT) methods, while its inference speed matches that of latent-space CoT approaches that predict answers directly without intermediate reasoning steps. The framework supports dual interpretability in both language and vision, enabling it to simultaneously explain decision rationales in text and visualize future scenarios through predicted images. Xiaomi has open-sourced the model weights, training and inference code for XiaomiOneVL, along with its technical report and project homepage, making them available to the broader research and industry community.

Editor:NewsAssistant