Protostar - Browser Copilot

Overview

An AI Agent that brings natural language control to web browsing. Built as a Chrome extension, it lets users interact with websites using conversational instructions.

Key Features

Natural Language Control: Cursor-like chat interaction with the browser
Async workflow: The Agent can work async from the user with zero interruption
Multi-Agent System: Concurrent AI agents working together for complex tasks
Chrome Extension: Fits into your existing Chromium browser. No need to install a new browser ;)

Capability

Here's a long horizon task that demonstrates the agent's capabilities.

Find four different laptops on Amazon and add them to cart without protection plans — one for a professional dad who edits 4K videos and needs powerful performance, one for a business-minded mom who wants a sleek and lightweight laptop for emails and meetings, one for a college-age daughter who studies design and needs a stylish portable machine for digital art and photo editing, and one for a high-school son who plays games, codes, and streams — each from a different brand, all currently available for purchase with their specs, prices, and ratings.

To complete this task, the agent needs to:

Research and find a suitable laptop for each family member
Navigate to the Amazon product page of each product
Add the item to cart (surprisingly tricky step!)

This is an interesting example because it exposed a subtle flaw in the way the agent determines "action" success. Here "action" -> "tool call" + its "intended effect".

The models (GPT-4.1 and GPT-oss-120B) I've tested on this task manage to select appropriate products and navigate to Amazon product pages. However, they struggle to add the items to cart.

The model assumes the act of clicking the "add to cart" button guarantees that the product gets added to the user's cart. This is not always the case though. Adding certain products to cart sometimes triggers a popup for a product insurance plan. This popup needs to be handled/closed for the product to actually be added to cart. The models see that the "Add to cart" button click tool call is successful and move on to the next product.

Amazon popup
The webpage content and behaviour is out of the Agent developer's control. Therefore, in a webpage environment, successful tool call execution should not be mistaken for successful intended effect. Contrast this with a file edit action by a coding agent inside a code editor, where a successful edit tool call guarantees the intended change in the actual file.

More advanced models that are RL tuned for Browser tasks probably won't make these mistakes. This issue was fixed for GPT-4.1 and GPT-oss-120B by restructuring the planning and execution pattern in the Agent harness to closely scrutinize the effects of key actions by performing simpler follow-up actions to verify them.