Tag
2 articles
AI2's new open-source web agent MolmoWeb navigates the web using only screenshots, outperforming larger proprietary systems despite its small size.
Researchers are exploring how to build vision-guided web AI agents using the MolmoWeb-4B model, which interprets screenshots to navigate and interact with websites without relying on HTML parsing.