Tag
1 article
Build a lightweight vision-language-action-inspired embodied agent that learns to perceive, plan, predict, and replan directly from pixel observations in a grid world environment.