XPeng Unveils X-Mind Framework to Enable Onboard Proactive Thinking

From：Internet Info Agency 2026-06-30 09:52:00

At the CVPR 2026 Workshop on Embodied Intelligence Foundation Model Deployment, Liu Xianming, Head of XPeng Group’s General Intelligence Center, outlined three core capabilities essential for an excellent world model: proactive reasoning, controllable generation, and long-horizon prediction. He also officially unveiled the X-Mind technical framework, designed to resolve the tension between cognitive reasoning and real-time computation under the constraints of limited onboard computational resources in autonomous driving systems. X-Mind enables instance-level visual chain-of-thought reasoning through explicit spatiotemporal prediction before generating actions, endowing the system with forward-looking judgment akin to that of an experienced human driver. The framework comprises three key technical components: First, it constructs a “thought sketch” by leveraging a deeply compressed autoencoder to condense the future 12 frames of world states into just 96 tokens. This representation preserves critical semantic information—including road topology, traffic light states, and navigation intent—within a BEV (Bird’s Eye View) layout enriched with abstract driving priors. Second, it employs a recursive block diffusion mechanism that internalizes future prediction across multiple layers within a large language model. This allows high-quality forecasting to be achieved in a single forward pass. The resulting image generation quality (FID 9.59) significantly outperforms single-step denoising approaches (FID 67.30), while maintaining nearly identical inference latency. Third, X-Mind enables chain-of-thought visualization, offering intuitive insights into how the model reasons about obstacle occupancy and lane connectivity—thereby supporting algorithm validation and fostering user trust. Unlike X-Foresight, previously introduced by XPeng and focused primarily on predicting future visual scenes, X-Mind is specifically designed to provide a “thinking canvas” for Vision-Language-Action (VLA) models, emphasizing efficient, human-like deliberation prior to action. Together, X-Mind and X-Foresight empower XPeng’s VLA models with physical common sense and interpretable reasoning capabilities. Trained on hundreds of millions of real-world driving frames, X-Mind demonstrates substantially lower trajectory prediction errors—both laterally and longitudinally—compared to conventional VLA models, with particularly notable improvements in safety and regulatory compliance for long-tail scenarios. Its low inference latency further confirms its feasibility for mass deployment on automotive-grade chips. XPeng stated that X-Mind, together with X-World and X-Foresight, forms the foundational R&D spectrum of its physics-based AI foundation models, completing the initial technical puzzle piece for proactive onboard reasoning. The company plans to continue exploring the upper limits of scaling laws and extend these capabilities to broader embodied intelligence applications.

Editor：NewsAssistant

Most Viewed in 24 Hours

: XPeng MONA L03 Launches: $20K SUV with 1500 TOPS and Gen-2 VLA Autonomous Driving; Tesla Rolls Out FSD V14 Lite to HW3 Owners, Narrowing Autopilot Gap Between Old and New Models; Volkswagen ID. Tiguan EV Spy Shots Emerge, Set to Replace ID.4; China's NEV Sales for June 2026 Released: BYD Hits Record Exports, Leapmotor Leads New EV Makers; Horse Power and Hofer Co-Develop Supercar-Grade 8-Speed DCT Transmission; Mercedes Cost-Cutting Sparks Protests by 90,000 German Workers Over Bonus Delays and Unpaid Overtime; Land Rover Unveils Discovery Landmark Edition, Likely Final Version of Current Fifth-Gen Model; XPeng MONA M03 Deliveries Top 280,000; First SUV MONA L03 Unveiled; Xiaomi Auto Delivers Over 180,000 Vehicles in First Half of 2026, Exceeding 30,000 Monthly for Three Consecutive Months; L3 Autonomous Driving Gains Policy Boost but Faces High Costs, Poor Experience, and Handover Challenges