X Square Robot today announced the open-source release of WALL-WM, a World Action Model for general-purpose embodied AI. The model is designed around a simple idea: robot world models should learn ...
The global AI video analytics market is on track to reach $17 billion by 2031, growing at over 22% annually. Behind the ...
Over the past few decades, roboticists worldwide have introduced increasingly advanced robots that can understand human ...
As a core component of the general embodied intelligence platform “Wise Kaiwu,” Pelican-Unify 1.0 has achieved world-leading ...
First unveiled at CES 2026, the Narwal Flow 2 immediately captured widespread media attention and earned multiple prestigious awards. Today, with its official release, Narwal brings this highly ...
Featuring unlimited object recognition, a 140°F self-cleaning track mopping system, and a reimagined premium design for smarter, more efficient home cleaning. First unveiled at CES 2026, the Narwal ...
InternVL3.5 Foundation Model reading_notes/2025-08_InternVL35.md Qwen2.5-VL Foundation Model reading_notes/2025-02_Qwen25-VL.md Janus-Pro Unified Generation reading ...
A scientific dispute spanning six decades about fundamental mechanisms of visual perception in mammals has now been settled. Researchers at TUM have succeeded in observing the visual information flow ...
Multimodal Large Language Models (MLLMs) have made impressive progress in connecting vision and language, but they still struggle with spatial understanding and viewpoint-aware reasoning. Recent ...
According to Stanford AI Lab, VAGEN is a reinforcement learning framework that teaches vision language model agents to construct internal world models via explicit visual state reasoning, enabling ...
We propose U-VLM, which enables hierarchical vision-language modeling in both training and architecture: (1) progressive training from segmentation to classification to report generation, and (2) ...