product agents

The agent browses. The human watches. Same browser.

BCTRL Team ·

Here’s a complete working session between a person and an AI agent, and the thing to notice is what’s missing from it.

The agent starts a browser:

TypeScript
const runtime = await bctrl.runtimes.create({
  type: "browser",
  name: "research",
});
const { connectUrl, runId } = await bctrl.runtimes.start(runtime.id);

It connects Playwright to connectUrl and gets to work — or skips the framework entirely and drives through hosted invocations. Twenty minutes in, the person types: “show me what you’re doing.” The agent runs one more call:

TypeScript
const live = await bctrl.runs.live(runId, { control: "input" });
// -> https://...  (expiring, tokenized)

…and pastes the URL into the chat. The person opens it in a browser tab and is looking at the agent’s browser, live. The site has thrown up a 2FA prompt; the person clicks into the embedded page, types the code from their phone, and goes back to lunch. The agent continues. Afterward there’s a replay of the whole session and a timeline of every step, because the run was recorded from the first navigation without anyone asking it to be.

What’s missing: an app to install, a screen-share to schedule, a “human-in-the-loop platform” to integrate, any moment where the agent stops and the state gets handed across a boundary. The collaboration is two API calls and a URL.

Why this is an API property, not a feature

The load-bearing fact is small and easy to miss: a runtime’s connectUrl is stable for the life of the run and supports concurrent clients. The agent’s CDP session, a hosted invocation, the recording pipeline, and the live view aren’t taking turns on the browser through some session-ownership protocol — they’re all attached to the same one, at the same time. The live view isn’t a video of the browser; it is the browser, which is why a human can act through it and the agent’s session doesn’t even blink.

Once observation and control are properties of the runtime rather than features of a client, the collaboration patterns stop needing to be built:

An agent that hits a wall doesn’t fail the job — it mints a takeover URL and puts it in the alert. The person resolves the wall and the job resumes. (While a human drives, invocations report runtime.controller_busy — the agent knows to wait, not wrestle.)

A person who doesn’t trust a new automation yet doesn’t read its code — they watch its first ten runs in an iframe, then stop watching, the way you’d onboard a junior colleague.

A team reviewing what an agent did last night doesn’t grep logs — they scrub the replay and read the activity timeline, where five hundred keystrokes are one row that says what was typed and where.

And because URLs are the medium, the human side needs no SDK, no account, no context. Anyone who can open a link can supervise an agent. That’s the lowest possible bar for the human half of human-AI collaboration, and we think that’s exactly where the bar should be.

🎬 Content TODO: one continuous ~45s screen recording of the actual scenario — chat on the left (agent working, user asks to see, agent replies with a URL), browser on the right (user opens it, watches, types a 2FA code, closes the tab). No cuts; the lack of cuts is the argument.

The shape of what’s coming

Most of the industry is building agent autonomy as an either/or: either the human drives or the agent does, with “handoff” as a feature on the boundary. We think the boundary is the mistake. Browsers have always been multiplayer at the protocol level — CDP never cared how many clients attached — and agents make that property matter for the first time.

So we built the runtime as the shared object: the agent automates it, the human observes or corrects it, the record of it accrues to the run, and every one of those capabilities is a plain API call away from whichever side needs it. The cookbook has the working versions — live embeds, replays, event streams — and the free tier is enough to run the scenario at the top of this post yourself, including the part where you take the wheel.