Home: Motoring > XPeng Unveils X-Mind Framework to Enable Onboard Proactive Thinking

XPeng Unveils X-Mind Framework to Enable Onboard Proactive Thinking

From:Internet Info Agency 2026-06-30 09:52:00

At the CVPR 2026 Workshop on Embodied Intelligence Foundation Model Deployment, Liu Xianming, Head of XPeng Group’s General Intelligence Center, outlined three core capabilities essential for an excellent world model: proactive reasoning, controllable generation, and long-horizon prediction. He also officially unveiled the X-Mind technical framework, designed to resolve the tension between cognitive reasoning and real-time computation under the constraints of limited onboard computational resources in autonomous driving systems. X-Mind enables instance-level visual chain-of-thought reasoning through explicit spatiotemporal prediction before generating actions, endowing the system with forward-looking judgment akin to that of an experienced human driver. The framework comprises three key technical components: First, it constructs a “thought sketch” by leveraging a deeply compressed autoencoder to condense the future 12 frames of world states into just 96 tokens. This representation preserves critical semantic information—including road topology, traffic light states, and navigation intent—within a BEV (Bird’s Eye View) layout enriched with abstract driving priors. Second, it employs a recursive block diffusion mechanism that internalizes future prediction across multiple layers within a large language model. This allows high-quality forecasting to be achieved in a single forward pass. The resulting image generation quality (FID 9.59) significantly outperforms single-step denoising approaches (FID 67.30), while maintaining nearly identical inference latency. Third, X-Mind enables chain-of-thought visualization, offering intuitive insights into how the model reasons about obstacle occupancy and lane connectivity—thereby supporting algorithm validation and fostering user trust. Unlike X-Foresight, previously introduced by XPeng and focused primarily on predicting future visual scenes, X-Mind is specifically designed to provide a “thinking canvas” for Vision-Language-Action (VLA) models, emphasizing efficient, human-like deliberation prior to action. Together, X-Mind and X-Foresight empower XPeng’s VLA models with physical common sense and interpretable reasoning capabilities. Trained on hundreds of millions of real-world driving frames, X-Mind demonstrates substantially lower trajectory prediction errors—both laterally and longitudinally—compared to conventional VLA models, with particularly notable improvements in safety and regulatory compliance for long-tail scenarios. Its low inference latency further confirms its feasibility for mass deployment on automotive-grade chips. XPeng stated that X-Mind, together with X-World and X-Foresight, forms the foundational R&D spectrum of its physics-based AI foundation models, completing the initial technical puzzle piece for proactive onboard reasoning. The company plans to continue exploring the upper limits of scaling laws and extend these capabilities to broader embodied intelligence applications.

Editor:NewsAssistant