From:Internet Info Agency
2026-05-13 20:44:00Xiaomi recently unveiled XiaomiOneVL, a one-step latent-space language-vision reasoning framework that unifies, for the first time in the industry, technical approaches including Vision-Language-Action (VLA) models, world models, and latent-space reasoning. Through a dual-supervision mechanism combining "language-based reasoning" and "visual future prediction," XiaomiOneVL integrates interpretability and future-scenario prediction capabilities into the latent-space reasoning process. It surpasses explicit Chain-of-Thought (CoT) methods in reasoning accuracy while matching the inference speed of latent-space CoT approaches that directly output answers. The framework is built upon three core technologies: the model reasons using an "internal language," possesses the ability to predict future visual frames, and compresses the entire reasoning process into a single step. These innovations aim to enhance autonomous driving systems' understanding of both current scenes and future spatiotemporal causal relationships, thereby enabling higher-quality decision-making. Xiaomi has fully open-sourced the model weights, training code, and inference code of XiaomiOneVL, making them available to developers and researchers worldwide to accelerate technological iteration and advancement in large autonomous-driving models.