Back to all posts

Introducing the Page Map Tool for Faster AI Agent Navigation

2 min read
Akram H. S.
Akram H. S.Founder & CTO

Today, we are thrilled to announce a significant enhancement to Owl Browser's automation capabilities: the `browser_get_page_map` tool.

This new tool leverages our custom browser renderer to intelligently map out all actionable, interactive elements on a webpage. For AI agents relying on text-based Large Language Models (LLMs) to navigate the web, this is a game-changer.

How It Works

The performance leap is made possible by our custom DOM renderer, which includes a precise registration system for elements that need to be displayed or interacted with. As elements are rendered, they must announce their position and receive a unique ID. This is robust even if elements shift or move dynamically.

Because of this deep integration, `browser_get_page_map` can flawlessly extract only the agent-actionable elements from the DOM. It cuts through the structural layout, styling wrappers, and extraneous noise, presenting the LLM with exactly what it needs to perform tasks.

Furthermore, developers can tune the verbosity of the extracted map, ensuring the agent gets the optimal amount of context without being overwhelmed.

Benchmarking: Text vs Vision Models

To test the efficacy of this new tool, we benchmarked the time it took for agents to accomplish complex web tasks. The goal was to book a specific flight.

  • Text Model (zai-org/glm-4.7-flash): Completed the task in under 2 minutes utilizing the structured output of browser_get_page_map.
  • Vision Model (zai-org/glm-4.6v-flash): Took approximately 6 minutes relying on visual analysis and standard DOM traversing techniques.

By providing a clean, structured representation of interactive regions, text models can execute tasks significantly faster than computationally expensive visual models.

This feature is currently rolling out in beta, and we can't wait to see the incredibly fast LLM agents developers build with it!

Want to automate seamlessly?

Owl Browser bypasses all sophisticated bot detections effortlessly.

Get Started Now